Is there any option to create a custom Profile Request for SSIS Data Profiling Task?
At the moment there are 5 standard profile requests under SSIS Data Profiling task:
Column Null Ratio Profile Request
Column Statistics Profile Request
Column Length Distribution Profile Request
Column Value Distribution Profile Request
Candidate Key Profile Request
I need to add another one (Custom one) to get summary of all numeric values.
Thanks in advance for your helps.
Based on this Microsoft Documentation, SSIS Data profiling Task has only 5 main profiles (listed on your question) and there is no option to add a custom profile.
For a similar reason, i will create an Execute SQL Task to achieve that, you can use the aggregate functions you need and ISNUMERIC function in the where clause :
SELECT MAX(CAST([Column] AS BIGINT)) -- Maximum value
,MIN(CAST([Column] AS BIGINT)) -- Minimum value
,COUNT(Column) -- Count values
,COUNT(DISTINCT [Column]) -- Count distinct values
,AVG(CAST([Column] AS BIGINT)) -- Average
,SUM(CAST([Column] AS BIGINT)) -- Sum
FROM TABLE
WHERE ISNUMERIC([Column]) = 1
I think what you want to do here is create a computed column that is populated with your source column only if IsNumeric(SourceColumn) = 1.
Then create a profile task using Column Value Distribution Profile Request on the computed column, with ValueDistributionOption set to AllValues.
Edit:
To further clarify, the computed column doesn't have to be a task in SSIS, although that's how I was thinking about it when I came up with my answer. You could ALTER the table you want to profile, adding the computed column, and then create the Profile Task as I explained above.
I was also under the assumption that you wanted to profile the values of a single column. If you're wanting to do this for multiple columns, or need to profile the summary values aggregated from details records, then this answer may not be the best solution.
Related
I have three fields badge number, termination date and status to update to Salesforce based on badge number.im using update strategy in mapping and on session level -upsert with external lookup field as badge_number_c and treat source rows as data driven(session properties). However we get only 50 records updated and 20000 records rejected as badge numbers not present in target and those 20 k records trying to insert and hence rejected(since we did not map all fields to form record in Salesforce as we only update).. for this error log it consuming lot of time and wf run time is high.
I tried to remove upsert and external lookup field but it throws error as I'd field missing..
it looks like you are trying to update salesforce target using infa target definition and mixing two things.
If you are using only update strategy + treat source rows as data driven(session properties), then please ensure you handle update condition in update strategy.
For example,
First calculate INSERT_UPDATE_FLAG using some lookup on target by joining on primary key columns.
And then use it like below logic in update strategy.
IIF ( INSERT_UPDATE_FLAG = 'UPD',DD_UPDATE, DD_INSERT) -- if you want UPSERT logic.
or
IIF ( INSERT_UPDATE_FLAG = 'UPD',DD_UPDATE, IIF(1=2,DD_INSERT)) -- if you want only UPDATE logic.
Also pls note, you need to mention primary key columns in infa target definition otherwise update wont work.
Now as per your screenshot, if you want to use SFDC specific logic, probably, you need to be careful and follow below link to do this. Its a multi step process to create external id first and then use it to do lookup and update.
https://knowledge.informatica.com/s/article/124909?language=en_US
I am surveying employees (asking them each several "1-5 Opinion Scale" questions) & want to provide relative anonymity by limiting the display of results if there are less than (say) 6 people in the result set. GDS doesn't allow aggregate filters on pivot tables. Does anyone know if there is a way around this by using calculated fields or some other mechanism?
In this case, I'd suggest you to add this functionality at the source of your data. For example, if your data is in BigQuery you could import it with a custom query like so:
SELECT
*
FROM
`your_table`
WHERE
5<(
SELECT
COUNT(*) AS num_rows
FROM
`your_table`)
I am trying to write a query in SQL Server that will automatically generate an email to the email provided in the table as I delete that row. Is this possible?
What I'm trying to achieve is: in a row number, there is a column that states notes where approvers can add their notes. Sometimes I have to restart this specific row of number. So I wrote a query to put the overall status to draft (regardless of what status it is in now). In doing so, every other column becomes null. what I'm trying to achieve is that as I set this row number to draft, an automatic email is generated that will show the comments note section to the email provided within that row.
Is this possible?
Thank you in advance.
I have a SQL Server database with table containing 300.000 data rows. There is an Index on the Primary Key and another key. I am using the following query in my standalone WCF server to fetch the data using an SQLConnection and SQLDataReader.
SELECT * FROM Users WHERE UserTypeId = #UserTypeId ORDER BY Users.Id OFFSET
#OFFSET ROWS FETCH NEXT #NUMBER ROWS ONLY
The Data returned by the DataReader is pushed into my own Class/Model and than returned by the function of the WCF server.
The WPF Client connects to the server and starts the command and only wants 500 data rows. However the time needed for this task is about 3-4 seconds. (Not mentioning the time for all data...)
The returned List is then used as the DataContext for the WPF Datagrid.
My question is, what can I check or what might be wrong. If you need more Information,CodeSamples,etc. please let me know!
First, don't use select *, instead specify what fields you want from the table. Now you are getting data that you don't need, for example the UserTypeId field which you already know for all the records that you get.
Then you can create a covering index that contains UserTypeId and Id, and has any other fields that you want to return from the query as included fields. That way the database can run the query against the index alone, and doesn't have to read anything from the actual table.
I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta's table. Pretty simple and that works great.
The problem is that I only want to add new rows into Beta. Normally I would just do something simple like....
INSERT INTO BetaPeople
SELECT * From AlphaPeople
where ID NOT IN (SELECT ID FROM BetaPeople)
But this doesn't work in an SSIS package. At least I don't know how and that is the point of this question. How would one go about doing this across servers?
Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.
CREATE TABLE dbo.LAST (RW int, LastID Int)
go
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)
Now you can use this to insert the last ID of the row transferred.
UPDATE dbo.LAST SET LastID = #myLastID WHERE RW = 1
When selecting OLEDB source, set data access mode to SQL Command and use
DECLARE #Last int
SET #Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > #Last;
Note, I do assume that you are using ID int IDENTITY for your PK.
If you have to monitor for data changes of existing records, then have the "last changed" column in every table, and store time of the last transfer.
A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.
INSERT INTO dbo.BetaPeople
SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)
Add a lookup between your source and destination.
Right click the lookup box to open Lookup Transformation Editor.
Choose [Redirect rows to no match output].
Open columns, map your key columns.
Add an entry with the table key in lookup column , lookup operation as
Connect lookup box to destination, choose [Lookup no Match Output]
Simplest method I have used is as follows:
Query Alpha in a Source task in a Dataflow and bring in records to the data flow.
Perform any needed Transformations.
Before writing to the Destination (Beta) perform a lookup matching the ID column from Alpha to those in Beta. On the first page of the Lookup Transformation editor, make sure you select "Redirect rows to no match output" from the dropdown list "Specify how to handle rows with now matching error"
Link the Lookup task to the Destination. This will give you a prompt where you can specify that it is the unmatched rows that you want to insert.
This is the classical Delta detection issue. The best solution is to use Change Data Capture with/without SSIS. If what you are looking for is a once in a life time activity, no need to go for SSIS. Use other means such as linked server and compare with existing records.
The following should solve issue of loading Changed and New records using SSIS:
Extract Data from Source usint Data flow.
Extract Data from Target.
Match on Primary key Add Unmatch records and split matched and unmatched records from Source and Matched records from Target call them Matched_Source,
Unmatch_Source and Matched_Target.
Compare Matched_Source and Matched_Target and Split Matched_Source to Changed and Unchanged.
Null load TempChanged Table.
Add Changed Records to TempChanged.
Execute SQL script/stored proc to Delete Records from Target for primary key in TempChanged and add records in TempChanged to Target.
Add Unmatched_Source to Target.
Another solution would be to use a temporary table.
In the properties for Beta's connection manager, change RetainSameConnection to true (by default SSIS runs each query in it's own connection, this would mean the temporary table would be killed as soon as it has been created).
Create a SQL Task using Beta's connection and use the following SQL to create your temporary table:
SELECT TOP 0 *
INTO ##beta_temp
FROM Beta
Next create a data flow that pulls data from Alpha and loads into ##beta_temp (you will need to run the SQL statement above on SSMS first so that Visual Studio can see the table at design time and you will also need to set the DelayValidation property to true on the Data Flow task).
Now you have two tables on the same server and you can just use your example SQL modified to use the temporary table.
INSERT INTO Beta
SELECT * FROM ##beta_temp
WHERE ID NOT IN (SELECT ID FROM Beta)