In my SSIS package I am using a data flow task to extract data from SQL Server and put it into a dataset with the following schema:
Column1 Int32
Column2 Object
Column3 Object
Column4 String
Column5 Double
That step seems to work well. In the foreach editor I mapped the columns to variables like this:
VARIABLE | INDEX
User::Column1 | 0
User::Column2 | 1
User::Column3 | 2
User::Column4 | 3
User::Column5 | 4
When I run the package I get the following error on the foreach task:
Error: The enumerator failed to retrieve element at index "4".
Error: ForEach Variable Mapping number 5 to variable "User::Column5" cannot be applied.
There are no null values in Column5 and I can clearly see all 5 columns in the query when I run it against the database. Any assistance is greatly appreciated!
I finally found the problem. The target dataset in the data flow task was dropping the last column for some reason. Once I recreated the dataset destination everything worked.
Related
I have a dataframe DF as below. Based on the Issue column and Datatype column I wants to create a dynamic query.
If Issue column is YES then check for the Datatype, If its StringType add Trim(DiffColumnName) to the query or if Datatype is integer do some other operation like round(COUNT,2)
And for the column for which Issue type is NO do nothing and select the Column itself
Query should be like this
Select DEST_COUNTRY_NAME, trim(ORIGIN_COUNTRY_NAME),round(COUNT,2)
+-------------------+-----------+-----+
| DiffColumnName| Datatype|Issue|
+-------------------+-----------+-----+
| DEST_COUNTRY_NAME| StringType| NO|
|ORIGIN_COUNTRY_NAME| StringType| YES|
| COUNT|IntegerType| YES|
+-------------------+-----------+-----+
I am not sure if I should be using If else condition here or case statement or create a UDF. Also my dataframe (i.e. columns) are dynamic and will be changed every time.
Need some suggestions how to proceed here. Thanks
This can be accomplished using the following piece of code.
Derive the new column by applying the required operations
Use collect_list to aggregate the values to an array
Format the output using concat_ws and concat
val origDF=Seq(("DEST_COUNTRY_NAME","StringType","NO"),
("ORIGIN_COUNTRY_NAME","StringType","YES"),
("COUNT","IntegerType","YES"),
("TESTCOL","StringType","NO")
).toDF("DiffColumnName","Datatype","Issue")
val finalDF=origDF.withColumn("newCol",when(col("Issue")==="YES" && col("DataType")==="StringType",concat(lit("trim("),col("DiffColumnName"),lit(")")))
when(col("Issue")==="YES" && col("DataType")==="IntegerType",concat(lit("round("),col("DiffColumnName"),lit(",2)")))
when(col("Issue")==="NO",col("DiffColumnName"))
)
finalDF.agg(collect_list("newCol").alias("queryout")).select(concat(lit("select "),concat_ws(",",col("queryout")))).show(false)
I included an additional column to the data for testing and it is giving me the desired output.
+-------------------------------------------------------------------------+
|concat(select , concat_ws(,, queryout)) |
+-------------------------------------------------------------------------+
|select DEST_COUNTRY_NAME,trim(ORIGIN_COUNTRY_NAME),round(COUNT,2),TESTCOL|
+-------------------------------------------------------------------------+
I'm setting up a data pipeline to export from a Kusto table to a SQL Server table. The only problem is the target table has two GENERATED ALWAYS columns. Looking for some help implementing the solution using Kusto.
This is the export statement:
.export async to sql ['CompletionImport']
h#"connection_string_here"
with (createifnotexists="true", primarykey="CompletionSearchId")
<|set notruncation;
apiV2CompletionSearchFinal
| where hash(SourceRecordId, 1) == 0
Which gives the error:
Cannot insert an explicit value into a GENERATED ALWAYS column in table 'server.dbo.CompletionImport'.
Use INSERT with a column list to exclude the GENERATED ALWAYS column, or insert a DEFAULT into GENERATED ALWAYS column.
So I'm a little unsure how to implement this solution in Kusto. Would I just add a project pipe excluding the GENERATED ALWAYS columns? Maybe ideally how could I insert a DEFAULT value into GENERATED ALWAYS SQL Server columns using a Kusto query?
Edit: Trying to use materialize to create a temporary table in the cache and export this cached table. However I can't find any documentation on this and the operation is failing:
let dboV2CompletionSearch = apiV2CompletionSearchFinal
| project every, variable, besides, generated, always, ones;
let cachedCS = materialize(dboV2CompletionSearch);
.export async to sql ['CompletionImport']
h#"connect_string"
with (createifnotexists="true", primarykey="CompletionSearchId")
<|set notruncation;
cachedCS
| where hash(SourceRecordId, 1) == 0
with the following error message:
Semantic error: 'set notruncation;
cachedCS
| where hash(SourceRecordId, 1) == 0'
has the following semantic error: SEM0100:
'where' operator: Failed to resolve table or column expression named 'cachedCS'.
I have an SSRS report that was pointed to SQL Server views, which pointed to Oracle tables. I edited the SSRS report Dataset so as to query directly from the Oracle db. It seems like a very simple change until I got this error message:
System.InvalidCastException: Specified cast is not valid.
With the following details...
Field ‘UOM_QTY’ and it also says at
Oracle.ManagedDataAccess.Client.OracleDataReader.GetDecimal(Int32 i).
The SELECT statement on that field is pretty simple:
, (DELV_RECEIPT.INV_LBS/ITEM_UOM_XREF.CONV_TO_LBS) AS UOM_QTY
Does anyone know what would cause the message, and how to resolve the error? My objective is use to use the ORACLE datasource instead of SQL SERVER.
Error 1
Severity Code Description Project File Line Suppression State
Warning [rsErrorReadingDataSetField] The dataset ‘dsIngredientCosts’ contains a definition for the Field ‘UOM_QTY’. The data extension returned an error during reading the field. System.InvalidCastException: Specified cast is not valid.
at Oracle.ManagedDataAccess.Client.OracleDataReader.GetDecimal(Int32 i)
at Oracle.ManagedDataAccess.Client.OracleDataReader.GetValue(Int32 i)
at Microsoft.ReportingServices.DataExtensions.DataReaderWrapper.GetValue(Int32 fieldIndex)
at Microsoft.ReportingServices.DataExtensions.MappingDataReader.GetFieldValue(Int32 aliasIndex) C:\Users\bl0040\Documents\Visual Studio 2015\Projects\SSRS\Project_ssrs2016\Subscription Reports\Feed Ingredient Weekly Price Avg.rdl 0
Error 2
Severity Code Description Project File Line Suppression State
Warning [rsMissingFieldInDataSet] The dataset ‘dsIngredientCosts’ contains a definition for the Field ‘UOM_QTY’. This field is missing from the returned result set from the data source. C:\Users\bl0040\Documents\Visual Studio 2015\Projects\SSRS\Project_ssrs2016\Subscription Reports\Feed Ingredient Weekly Price Avg.rdl 0
Source Tables:
+------------+---------------+-------------+---------------+-----------+
| Source | TABLE_NAME | COLUMN_NAME | DataSize | COLUMN_ID |
+------------+---------------+-------------+---------------+-----------+
| ORACLE | DELV_RECEIPT | INV_LBS | NUMBER (7,0) | 66 |
+------------+---------------+-------------+---------------+-----------+
| ORACLE | ITEM_UOM_XREF | CONV_TO_LBS | NUMBER (9,4) | 3 |
+------------+---------------+-------------+---------------+-----------+
| SQL SERVER | DELV_RECEIPT | INV_LBS | numeric (7,0) | 66 |
+------------+---------------+-------------+---------------+-----------+
| SQL SERVER | ITEM_UOM_XREF | CONV_TO_LBS | numeric (9,4) | 3 |
+------------+---------------+-------------+---------------+-----------+
The error went away after adding a datatype conversion statement to the data selection.
, CAST(DELV_RECEIPT.INV_LBS/ITEM_UOM_XREF.CONV_TO_LBS AS NUMERIC(9,4)) AS UOM_QTY
Can anyone provide some information on why the original query would be a problem and why the CAST would fix these errors? I tried casting the results because someone on Code Project forum said...
why don't you use typed datasets? you get such head aches just because
of not coding in a type-safe manner. you have a dataset designer in
the IDE which makes the life better, safer, easier and you don't use
it. I really can't understand.
Here is an approach to fix this error with an extension method instead of modifying the SQL-Query.
public static Decimal MyGetDecimal(this OracleDataReader reader, int i)
{
try
{
return reader.GetDecimal(i);
}
catch (System.InvalidCastException)
{
Oracle.ManagedDataAccess.Types.OracleDecimal hlp = reader.GetOracleDecimal(i);
Oracle.ManagedDataAccess.Types.OracleDecimal hlp2 = Oracle.ManagedDataAccess.Types.OracleDecimal.SetPrecision(hlp, 27);
return hlp2.Value;
}
}
Thank you for this but what happens if your query looks like:
SELECT x.* from x
and .GetDecimal appears nowhere?
Any suggestions in that case? I have created a function in ORACLE itself that rounds all values in a result set to avoid this for basic select statements but this seems wrong for loading updateable datasets...
Obviously this is an old-school approach to getting data.
New to SSIS and am trying to import a flat file into my DB. There are 6 different rows on the flat file that I need to combine into one row in the database, each of these rows contain a different price for one symbol. For example below:
IGBGK 21 w 47
IGBGK 21 u 2.9150
IGBGK 21 h 2.9300
IGBGK 21 l 2.9050
IGBGK 22 h 2.9300
IGBGK 22 l 2.8800
So each of these are in a different rows on the flat file but will become one row in different columns for symbol IGBGK. I can transform the data to place each number into its own column but can not get them to combine into one row.
Any help on the direction I need to go with this is greatly appreciated.
End product should look like:
Symbol | col 1 | col 2 | col 3 | col 4 | col 5 | col 6
-------+-------+-------+-------+-------+-------+-------
IGBGK | 47 | 2.915 | 29.30 | 2.905 | 2.930 | 2.880
1.Name a variable with whatever name you want with system object type
2.Use execute sql task
Query for you table:
WIth ABC
as
(Select * From table --which give you the original result
)
Select * From ABC
PIVOT (Count(**4th Column Name**) for **1st Column Name** IN ([col 1],[col 2],[col 3],[col 4],[col 5],[col 6]))
4.copy all the complete query into that task and specify the result Set to Full result
5.Switch to Result Set page, choose the variable you create, and set the result name to 0
6.Now every time you run the package the variable will be assigned as the complete result table as shown in your desired format above.
7.And specify another 7 variables corresponding to each column, "symbol, [col 1]...", should be string data type for each variable
Use another execute sql task, specify Variable in SQL Source Type, then go to the Parameter Mapping page, choose that System Object variable, set Name to 0, after that go to Result set page, choose all those seven parameters one by one, and change the parameter name to 0,1,2,3,4,5,6
From now on every time you run the package, each variable would be assigned each value, if you want to load them into target table, here comes the last step
Use another Execute SQL Task, using query like this:
Insert into table
select ?,?,?,?,?,?,?
go to the Parameter Mapping page, choose all those seven variables and change name to 0,1,2,3,4,5,6 for each one by one to map the ?
There could be some small issue you need to figure by yourself, like the data type, but the logic is almost like this.
Hope this helps!
Background:
I have a table with the following schema on a SQL server. Updates to existing rows is possible and new rows are also added to this table.
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-18 19:07:00.0 | 180
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
I am using Sqoop to add incremental updates in lastmodified mode. My --check-column parameter is the last_login_date column. In my first run, I got the above two records into Hadoop - let's call this current data. I noted that the last value (the max value of the the check column from this first import) is 2016-06-18 19:07:00.0.
Assuming there is a change on the SQL server side, I now have the following changes on the SQL server side:
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-25 20:10:00.0 | 200
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
125-500 | 500 | 2016-06-28 19:54:00.0 | 1
I have the row 123-111 updated with a more recent last_login_date value and the count column has also been updated. I also have a new row 125-500 added.
On my second run, sqoop looks at all columns with a last_login_date column greater than my known last value from the previous import - 2016-06-18 19:07:00.0
This gives me only the changed data, i.e. 123-111 and 125-500 records. Let's call this - new data.
Question
How do I do a merge join in Hadoop/Hive using the current data and the new data so that I end up with the updated version of 123-111, 124-100, and the newly added 125-500?
Changed data load using scoop is a two phase process.
1st phase - load changed data into some temp (stage) table using
sqoop import utility.
2nd phase - Merge changed data with old data using sqoop-merge
utility.
If the table is small(say few M records) then use full load using sqoop import.
Sometimes it's possible to load only latest partition - in such case use sqoop import utility to load partition using custom query, then instead of merge simply insert overwrite loaded partition into target table, or copy files - this will work faster than sqoop merge.
You can change the existing Sqoop query (by specifying a new custom query) to get ALL the data from the source table instead of getting only the changed data. Refer using_sqoop_to_move_data_into_hive. This would be the simplest way to accomplish this - i.e doing a full data refresh instead of applying deltas.