Lookup component fails to match empty strings when full cache is used - sql-server

I have lookup component a with a lookup table that retusn a varchar(4) column with 3 possible values: "T", "R" or "" (empty string).
I'm using an OLE DB connection for the lookup table, and have tried direct access to the table, as well as specifying a query with an RTRIM() on the column, to get sure that the string is empty and not a "blank string of some length".
If I set the cache mode to "Partial cache" everything works fine (either with direct reading of the table, or using the trimming query), and the empty strings of the input table are correctly matched to the corresponding lookup table row.
However, If I change the cache mode to "Full cache", none of the empty strings are matched at all.
I've checked that the data type, DT_STR, and lenght, 4, is the same in the lookup table and the input table.
Is there something that explains this behaviour? Can it be modified?
NOTE: This is not the documented problem with null values. It's about empty strings.

Somewhere, you have trailing spaces, either in your source or your lookup.
Consider the following source query.
SELECT
D.SourceColumn
, D.Description
FROM
(
VALUES
(CAST('T' AS varchar(4)), 'T')
, (CAST('R' AS varchar(4)), 'R')
, (CAST('' AS varchar(4)), 'Empty string')
, (CAST(' ' AS varchar(4)), 'Blanks')
, (NULL, 'NULL')
) D (SourceColumn, Description);
For my lookup, I restricted the above query to just T, R and the Empty String rows.
You can see that for the 5 source rows, T, R and Empty String matched and went to the Match Output path. Where I used a NULL or explicitly used spaces, did not make a match.
If I change my lookup mode from Full Cache to Partial, the NULL continues to not match while the explicit spaces does match.
Wut?
In full cache mode, the Lookup transformation executes the source query and keeps the data locally on the machine SSIS is executing on. This lookup is going to be an exact match using .NET equality rules. In that case, '' will not match ' '.
However, when we change our cache mode to None or Partial, we will no longer be relying on the .NET matching rules and instead, we'll use the source Database's matching rules. In TSQL, '' will match ' '
To make your Full Cache mode work as expected, you will need to apply an RTRIM in your Source and/or Lookup transformation. If you are convinced RTRIM isn't working your source, add a Derived Column Transformation and then apply your RTRIM there but I find it's better to abuse the database instead of SSIS.
Biml
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that we're going to use to transform a Biml file below into an SSIS package.
The following biml will generate the
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_26719974">
<Tasks>
<Dataflow Name="DFT Demo">
<Transformations>
<OleDbSource
ConnectionName="CM_OLE"
Name="OLESRC Source">
<DirectInput>
SELECT
D.SourceColumn
, D.Description
FROM
(
VALUES
(CAST('T' AS varchar(4)), 'T')
, (CAST('R' AS varchar(4)), 'R')
, (CAST('' AS varchar(4)), 'Empty string')
, (CAST(' ' AS varchar(4)), 'Blanks')
, (NULL, 'NULL')
) D (SourceColumn, Description);
</DirectInput>
</OleDbSource>
<Lookup
Name="LKP POC"
OleDbConnectionName="CM_OLE"
NoMatchBehavior="RedirectRowsToNoMatchOutput"
>
<DirectInput>
SELECT
D.SourceColumn
FROM
(
VALUES
(CAST('T' AS varchar(4)))
, (CAST('R' AS varchar(4)))
, (CAST('' AS varchar(4)))
) D (SourceColumn);
</DirectInput>
<Inputs>
<Column SourceColumn="SourceColumn" TargetColumn="SourceColumn"></Column>
</Inputs>
</Lookup>
<DerivedColumns Name="DER Default catcher" />
<DerivedColumns Name="DER NoMatch catcher">
<InputPath OutputPathName="LKP POC.NoMatch" />
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>

The issue is that FULL Cache uses a .Net equality comparison and Partial and None use SQL.
I have had a similar issue where all works well with a Partial cache and when I use Full, I get Errors with Row not found, as I'm Failing on No Match.
My issue was a lower case string in the source and an UPPER version in the Lookup table, so Full/.Net sees these as different and Partial/SQL are happy to do a Case insensitive join.
Output the No Match rows to a csv file if you want to see the rows that are failing.

Related

ABAP SQL preserve OR pad trailing spaces

I am trying to find a way to preserve a space within SQL concatenation.
For context: A table I am selecting from a table with a single concatenated key column. Concatenated keys respect spaces.
Example: BUKRS(4) = 'XYZ ', WERKS(4) = 'ABCD' is represented as key XYZ ABCD.
I am trying to form the same value in SQL, but it seems like ABAP SQL auto-trims all trailing spaces.
Select concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) as key, datab, datbi
from t001w
inner join tvko on tvko~vkorg = t001w~vkorg
left join ztab on ztab~key = concat( rpad( tvko~bukrs, 4, (' ') ), t001w~werks ) "This is why I need the concat
rpad( tvko~bukrs, 4, ' ' ) in this example returns XYZ, instead of XYZ , which leads to concatenated value being XYZABCD, rather than XYZ ABCD.
lpad seems to work just fine (returning XYZ), which leads me to believe I'm doing something wrong.
SQL functions don't accept string literals or variables (which preserve spaces in the same circumstances in ABAP) as they are non-elementary types.
Is there any way to pad/preserve the spaces in ABAP SQL (without pulling data and doing it in application server)?
Update: I solved my problem by splitting key selection from data selection and building the key in ABAP AS. It's a workaround that avoids the problem instead of solving it, so I'll keep the question open in case an actual solution appears.
EDIT: this post doesn't answer the question of inserting a number of characters which vary based on values in some table columns e.g. LENGTH function is forbidden in RPAD( tvko~bukrs, LENGTH( ... ), (' ') ). It's only starting from ABAP 7.55 that you can indicate SQL expressions instead of fixed numbers. You can't do it in ABAP before that. Possible workarounds are to mix ABAP SQL and ABAP (e.g. LIKE 'part1%part2' and then filtering out using ABAP) or to use native SQL directly (ADBC, AMDP, etc.)
Concerning how the trailing spaces are managed in OpenSQL/ABAP SQL, they seem to be ignored, the same way as they are ignored with ABAP fixed-length character variables.
Demonstration: I simplified your example to extract the line Walldorf plant:
These ones don't work (no line returned):
SELECT * FROM t001w
WHERE concat( 'Walldorf ' , 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_1).
SELECT * FROM t001w
WHERE concat( rpad( 'Walldorf', 1, ' ' ), 'plant' ) = t001w~name1
INTO TABLE #DATA(itab_2).
These 2 ones work, one with leading space(s), one using concat_with_space:
SELECT * FROM t001w
WHERE concat( 'Walldorf', ' plant' ) = t001w~name1
INTO TABLE #DATA(itab_3).
SELECT * FROM t001w
WHERE concat_with_space( 'Walldorf', 'plant', 1 ) = t001w~name1
INTO TABLE #DATA(itab_4).
General information: ABAP documentation - SQL string functions
EDIT: working example added, using leading space(s).

Snowflake JSON with foreign language to tabular format dynamically

I read through snowflake documentation and the web and found only one solution to my problem by https://stackoverflow.com/users/12756381/greg-pavlik which can be found here Snowflake JSON to tabular
This doesn't work on data with Russian attribute names and attribute values. What modifications can be made for this to fit my case?
Here is an example:
create or replace table target_json_table(
v variant
);
INSERT INTO target_json_table SELECT parse_json('{
"at": {
"cf": "NV"
},
"pd": {
"мо": "мо",
"ä": "ä",
"retailerName": "retailer",
"productName":"product"
}
}');
call create_view_over_json('target_json_table', 'V', 'MY_VIEW');
ERROR: Encountered an error while creating the view. SQL compilation error: syntax error line 7 at position 7 unexpected 'ä:'. syntax error line 8 at position 7 unexpected 'мо'.
There was a bug in the original SQL used as a basis for the creation of the stored procedure. I have corrected that. You can get an update on the Github page. The changed section is here:
sql =
`
SELECT DISTINCT '"' || array_to_string(split(f.path, '.'), '"."') || '"' AS path_nAme, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
'"' || array_to_string(split(f.path, '.'), '.') || '"' AS alias_name -- This generates column aliases based on the path
FROM
#~TABLE_NAME~#,
LATERAL FLATTEN(#~COL_NAME~#, RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[') -- This prevents traversal down into arrays
limit ${ROW_SAMPLE_SIZE}
`;
Previously this SQL simply replaced non-ASCII characters with underscores. The updated SQL will wrap key names in double quotes to create non-ASCII key names.
Be sure that's what you want it to do. Also, the keys are nested. I decided that the best way to handle that is to create column names in the view with dot notation, for example one column name is pd.ä. That will require wrapping the column name with double quotes, such as:
select * from MY_VIEW where "pd.ä" = 'ä';
Final note: The name of your stored procedure is create_view_over_json, however, in the Github project the name is create_view_over_variant. When you update, be sure to call the right procedure.

BigQuery or SQL Server SPLIT query

I have searched around and can not find much on this topic. I have a table, that gets logging information. As a result the column I am interested in contains multiple values that I need to search against. The column is formatted in a php URL style. i.e.
/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32
This makes all searches end up with really long regexes to get data. Then join statements to combine data.
Is there a way in BigQuery, or SQL Server that I can pull the information from that column and put it into new columns?
Example:
The information I would like extracted begins after the ?, and ends at &, The string can sometimes be longer, and contains additional headers.
Thanks,
Below is for BigQuery Standard SQL and addresses below aspect of your question
Is there a way in BigQuery, ... that I can pull the information from that column and put it into new columns?
#standardSQL
CREATE TEMP FUNCTION parseColumn(kv STRING, column_name STRING) AS (
IF(SPLIT(kv, '=')[OFFSET(0)]= column_name, SPLIT(kv, '=')[OFFSET(1)], NULL)
);
WITH `project.dataset.table` AS (
SELECT '/test/test.aspx?extra=abc&DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url UNION ALL
SELECT '/test/test.aspx?DS_Vendor=55192&DS_ProdVer=4.30.100.0&more=123&DS_ProdLang=DE&DS_Product=MTE&DS_OfficeBits=64'
)
SELECT
MIN(parseColumn(kv, 'DS_Vendor')) AS DS_Vendor,
MIN(parseColumn(kv, 'DS_ProdVer')) AS DS_ProdVer,
MIN(parseColumn(kv, 'DS_ProdLang')) AS DS_ProdLang,
MIN(parseColumn(kv, 'DS_Product')) AS DS_Product,
MIN(parseColumn(kv, 'DS_OfficeBits')) AS DS_OfficeBits
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS kv
GROUP BY url
with the result as below
Row DS_Vendor DS_ProdVer DS_ProdLang DS_Product DS_OfficeBits
1 55039 7.90.100.0 EN MTT 32
2 55192 4.30.100.0 DE MTE 64
Below is also addressed
The string can sometimes be longer, and contains additional headers.
One example using BigQuery (with standard SQL):
SELECT REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
This returns the parts of the URL as an ARRAY<STRING>. To go one step further, you can get back an ARRAY<STRUCT<key STRING, value STRING>> with a query of this form:
SELECT
ARRAY(
SELECT AS STRUCT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part
) AS keys_and_values
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
...or with the keys and values as top-level columns:
SELECT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
CROSS JOIN UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part

How to use selected SQL statement in SSIS package as source variable?

I created SSIS package where I need to use FELC. The first step before the loop is to run sql task to obtain all SQL statements designed to generate different XML files and stored in a source table. Inside the FELC I would like to process the statements to generate XML files and send them to various folder locations with names and target folder coming from the source table. There is hundreds of files that needs to be refreshed on regular basis. Instead of running single jobs for each XML file generation I would like to amalgamate it into one process.
Is it possible?
This is the basic Shred Recordset pattern.
I have 3 variables declared: SourceQuery, CurrentQuery and rsQueryData. The first 2 are Strings, the last is an Object type
SQL - Get source data
This is my query. It simulates your table and induces a failing SQL Statement if I take out the filter.
SELECT
ProcessID
, Stmt_details
FROM
(
VALUES
( 1, 'SELECT 1;', 1)
, ( 20, 'SELECT 20;', 1)
, ( 30, 'SELECT 1/0;', 0)
) Stmt_collection (ProcessID, Stmt_details, xmlFlag)
WHERE
xmlFlag = 1
The Execute SQL Task is set with Recordset = Full and I assign it to variable User::rsQueryData which has a name of 0 in the mapping tab.
FELC
This is a standard Foreach ADO Recordset Loop container. I use my User::rsQueryData as the source and since I only care about the second element, ordinal position 1, that's the only thing I map. I assign the current value to User::CurrentStatement
SQL - Execute CurrentStatement
This is an Execute SQL Task that has as its source the Variable User::CurrentStatement. There's no scripting involved. The FELC handles the assignment of values to this Variable. This Task uses as its source that same Variable. This is very much how native SSIS developers will approach solving a problem. If you reach for a Script Task or Component as the first approach, you're likely doing it wrong.
Biml
If you're doing any level of SSIS/SSRS/SSAS development, you want Bids Helper It is a free add on to Visual Studio that makes your development life so much easier. The feature I'm going to leverage here is the ability to declaratively define an SSIS package. This language is called the Business Intelligence Markup Language, Biml, and I love it for many reasons but on StackOverflow, I love it because I can give you the code to reproduce exactly my solution. Otherwise, I have to build out a few hundred screenshots showing you everywhere I have to click and set values.
Or, you
1. Download and install BIDS Helper
2. Open up your existing SSIS project
3. Right click on the Project and select "Add new Biml file"
4. In the resulting BimlScript.biml file, open it up and paste all of the following code into it
5. Fix the value for your database connection string. This one assumes you have an instance on your local machine called Dev2014
6. Save the biml file
7. Right click that BimlScript.biml and select "Generate SSIS Packages"
8. Marvel at the resulting so_28867703.dtsx package that was added to your solution
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_28867703">
<Variables>
<Variable DataType="String" Name="QuerySource">SELECT ProcessID, Stmt_details FROM (VALUES (1, 'SELECT 1;', 1), (20, 'SELECT 20;', 1), (30, 'SELECT 1/0;', 0))Stmt_collection(ProcessID, Stmt_details, xmlFlag) WHERE xmlFlag = 1 </Variable>
<Variable DataType="String" Name="CurrentStatement">This statement is invalid</Variable>
<Variable DataType="Object" Name="rsQueryData"></Variable>
</Variables>
<Tasks>
<ExecuteSQL
ConnectionName="CM_OLE"
Name="SQL - Get source data"
ResultSet="Full"
>
<VariableInput VariableName="User.QuerySource" />
<Results>
<Result VariableName="User.rsQueryData" Name="0" />
</Results>
</ExecuteSQL>
<ForEachAdoLoop
SourceVariableName="User.rsQueryData"
ConstraintMode="Linear"
Name="FELC - Shred RS"
>
<VariableMappings>
<!--
0 based system
-->
<VariableMapping VariableName="User.CurrentStatement" Name="1" />
</VariableMappings>
<Tasks>
<ExecuteSQL ConnectionName="CM_OLE" Name="SQL - Execute CurrentStatement">
<VariableInput VariableName="User.CurrentStatement" />
</ExecuteSQL>
</Tasks>
</ForEachAdoLoop>
</Tasks>
</Package>
</Packages>
</Biml>
That package will run, assuming you fixed the connection string to a valid instance. You can see below that if you put break points on the Execute SQL Task, it will light up two times. If you have a watch window on CurrentStatement, you can see it change from the design time value to the values shredded from the result set.
While we await clarification on XML and files, if the goal is to take the query from the FELC and export to file, I answered that https://stackoverflow.com/a/9105756/181965 Although in this case, I'd restructure your package to just the Data Flow and eliminate the shredding as there's no need to complicate matters to export a single row N times.
If i understand you correctly; You can add a "Script Task" from Toolbox to first step of loop container and store the selected statement from the database in to the global variable and pass it for execution in the next step

Checking if an XML element is marked with `xsi:nil` in SQL

I am working on a stored procedure which shreds an XML document. One of the child elements in the records being processed can sometimes be marked with the xsi:nil="true" attribute. Other times, it can contain a dateTime. I'm trying to insert a string into a column of my table which depends on whether or not this element has a value. For example:
[Status] = CASE WHEN (Rt.Item.value('(./Date)[1]', 'nvarchar(max)') = '') THEN N'SUBMITTED' ELSE N'PROCESSED' END
Unfortunately, this doesn't seem to be working. What's the correct to check if an element has a value in SQL Server?
Generally:
theElementName[not(#xsi:nil eq 'true')]/any/other/needed/location/steps
If the association of the "xsi" prefix to the appropriate namespace isn't registered (the way to do this is implementation-specific and you need to check how this is to be done in your situation), one still can use:
theElementName[not(#*[name() eq 'xsi:nil'] eq 'true')]

Resources