SSIS comma delimited string in where clause - sql-server

I am trying to see if there is an easy answer for this. I have done something similar using multiple pick dropdown parameters in SSRS but this appears to be different.
My scenario is this, so maybe there is an even better answer.
I have a production server that I do not want to make any changes to including temp tables or functions. The production server has a table of clients with about 1600 records. I have set up an SSIS package that will allow transfer of data from production to dev based on a clientid. So my sources would have a query similar to Select Field From Table Where ClientId = ?
This works fine. Now I want to load more than one client, based an data in the clients table. It may be Select ClientId From Clients where Field = A and returns multiple ClientIds.
I am able to populate a comma delimited list from an execute sql task to a SSIS variable, so it maybe 1,4,8.
If I change my source query to use ClientId in (?) I get a conversion error.
I have looked at many posts that advocate a temp table or a function which I want to avoid. Select IN using varchar string with comma delimited values
I have contemplated building the entire sql statement into a variable but this don't seem like the right path as I have many tables to query and transfer where using ClientId = ? works well without having to build each individual SQL statement to a variable.
Is there an easy fix I am missing? I will turn my research now to try to find out how I did this in SSRS but I thought that I should try a post here to see if someone has accomplished this before.
I appreciate any info on this, thank you.
EDIT: Key note is that the column on clients is on the dev server, so I cannot just use a select in the where clause as the column does not exist on the production server.
EDIT: I did not mention that I am specifically looking at OLEDB sources mapping a parameter to ? in the sql statement.
EDIT: Narrowing down on this but having trouble relating SSRS and SSIS functionality. In SSRS its called a multi-value parameter in the following link the key line is
WHERE Production.ProductInventory.ProductID IN (#ProductID)
https://msdn.microsoft.com/en-us/library/dn385719(v=sql.110).aspx
This one looks good as well
https://sqlblogcasts.com/blogs/simons/archive/2007/11/22/RS-HowTo---Pass-a-multivalue-parameter-to-a-query-using-IN.aspx
I will keep researching and thank you for the help so far.
I think this sums it up best
This functionality is limited to strictly using embedded SQL.
What SSRS does is transform your SQL column IN (#value) to column IN
(#selectedvalue1,#selectedvalue2) etc.
You need to forget anything you have about the other ways of passing
lists to SQL i.e. building strings etc. and make sure you declare the
data types are correct for the value of your parameter.
You do not need to use the Join(parameters!,",") trick UNLESS
you are passing the list to a stored procedure.
In which case you then need to use some function to turn the delimited
list into a rowset as you have done.
I hope that helps
The core question is if I can get the same functionality in SSIS as in SSRS. It reminds me of macro substitution..

If you dont want to create a function, you can use the following in your t-sql statement.
Declare #ClientIds nvarchar(50) = '123,456'; --<-- Comma delimited list of Client Ids
Select Field
From Table
Where ClientId IN (
SELECT CAST(RTRIM(LTRIM(Split.a.value('.', 'VARCHAR(100)'))) AS INT) ClientIDs
FROM (
SELECT Cast ('<X>'
+ Replace(#ClientIds, ',', '</X><X>')
+ '</X>' AS XML) AS Data
) AS t CROSS APPLY Data.nodes ('/X') AS Split(a)
)

Related

Mule - Record cannot be mapped as it contains multiple columns with the same label

I need to do join query to MS SQL Server 2014 DB based on a column name value. The same query runs when doing query directly to DB, but when doing query through Mule I'm getting error. The query looks something like this :
SELECT * FROM sch.emple JOIN sch.dept on sch.emple.empid = sch.dept.empid;
The above query work fine while doing query directly to MS SQL Server DB, but gives the following error through mulesoft.
Record cannot be mapped as it contains multiple columns with the same label. Define column aliases to solve this problem (java.lang.IllegalArgumentException). Message payload is of type: String
Request you to please help me out.
Specify columns list directly:
SELECT e.<col1>, e.<col2>, ...., d.<col1>,...
FROM sch.emple AS e
JOIN sch.dept AS d
ON e.empid = d.empid;
Remarks:
You could use aliases instead of schema.table_name
SELECT * in production code in 95% cases is bad practice
The column that has duplicate is empid(or more). You could add alias for it e.empid AS emple_empid and d.empid AS dept_empid or just specify e.empid once.
To avoid specifying all columns manually, you could drag and drop them from object explorer to query pane like Drag and Drop Column List into query window.
Second way is to use plugin like Redgate Prompt to expand SELECT *:
Image from: https://www.simple-talk.com/sql/sql-tools/sql-server-intellisense-vs.-red-gate-sql-prompt/
Addendum
But the same query works directly.
It works because you don't bind them. Please read carefully link I provided for SELECT * antipattern and especially:
Binding Problems
When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can
often crash your data consumer. Imagine a query that joins two
tables, both of which contain a column called "ID". How would a
consumer know which was which? SELECT * can also confuse views (at
least in some versions SQL Server) when underlying table structures
change -- the view is not rebuilt, and the data which comes back can
be nonsense. And the worst part of it is that you can take care
to name your columns whatever you want, but the next guy who comes
along might have no way of knowing that he has to worry about adding a
column which will collide with your already-developed names.
But the same query works directly.
by Dave Markle

REST Backend with specified columns, SQL questions

I'm working with a new REST backend talking to a SQL Server. Our REST api allows for the caller to pass in the columns/fields they want returned (?fields=id,name,phone).
The idea seems very normal. The issue I'm bumping up against is resistance to dynamically generating the SQL statement. Any arguments passed in would be passed to the database using a parameterized query, so I'm not concerned about SQL injection.
The basic idea would be to "inject" the column-names passed in, into a SQL that looks like:
SELECT <column-names>
FROM myTable
ORDER BY <column-name-to-sort-by>
LIMIT 1000
We sanitize all column names and verify their existence in the table, to prevent SQL injection issues. Most of our programmers are used to having all SQL in static files, and loading them from disk and passing them on to the database. The idea of code creating SQL makes them very nervous.
I guess I'm curious if others actually do this? If so, how do you do this? If not, how do you manage "dynamic columns and dynamic sort-by" requests passed in?
I think a lot of people do it especially when it comes to reporting features. There are actually two things one should do to stay on the safe side:
Parameterize all WHERE clause values
Use user input values to pick correct column/table names, don't use the user values in the sql statement at all
To elaborate on item #2, I would have a dictionary where Key is a possible user input and Value is a correponding column/table name. You can store this dictionary wherever you want: config file, database, hard code, etc. So when you process user input you just check a dictionary if the Key exists and if it does you use the Value to add a column name to your query. This way you just use user input to pick required column names but don't use the actual values in your sql statement. Besides, you might not want to expose all columns. With a predefined dictionary you can easily control the list of available columns for a user.
Hope it helps!
I've done similar to what Maksym suggests. In my case, keys were pulled directly from the database system tables (after scrubbing the user request a bit for syntactic hacks and permissions).
The following query takes care of some minor injection issues through the natural way SQL handles the LIKE condition. This doesn't go as far as handling permissions on each field (as some fields are forbidden based on the log-in) but it provides a very basic way to retrieve these fields dynamically.
CREATE PROC get_allowed_column_names
#input VARCHAR(MAX)
AS BEGIN
SELECT
columns.name AS allowed_column_name
FROM
syscolumns AS columns,
sysobjects AS tables
WHERE
columns.id = tables.id AND
tables.name = 'Categories' AND
#input LIKE '%' + columns.name + '%'
END
GO
-- The following only returns "Picture"
EXEC get_allowed_column_names 'Category_,Cat%,Picture'
GO
-- The following returns both "CategoryID and Picture"
EXEC get_allowed_column_names 'CategoryID, Picture'
GO

Using an IN clause with LINQ-to-SQL's ExecuteQuery

LINQ to SQL did a horrible job translating one of my queries, so I rewrote it by hand. The problem is that the rewrite necessarily involves an IN clause, and I cannot for the life of me figure out how to pass a collection to ExecuteQuery for that purpose. The only thing I can come up with, which I've seen suggested on here, is to use string.Format on the entire query string to kluge around it—but that will prevent the query from ever ending up in the query cache.
What's the right way to do this?
NOTE: Please note that I am using raw SQL passed to ExecuteQuery. I said that in the very first sentence. Telling me to use Contains is not helpful, unless you know a way to mix Contains with raw SQL.
Table-Valued Parameters
On Cheezburger.com, we often need to pass a list of AssetIDs or UserIDs into a stored procedure or database query.
The bad way: Dynamic SQL
One way to pass this list in was to use dynamic SQL.
IEnumerable<long> assetIDs = GetAssetIDs();
var myQuery = "SELECT Name FROM Asset WHERE AssetID IN (" + assetIDs.Join(",") + ")";
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"), myQuery);
This is a very bad thing to do:
Dynamic SQL gives attackers a weakness by making SQL injection attacks easier.
Since we are usually just concatenating numbers together, this is highly unlikely, but
if you start concatenating strings together, all it takes is one user to type ';DROP TABLE Asset;SELECT '
and our site is dead.
Stored procedures can't have dynamic SQL, so the query had to be stored in code instead of in the DB schema.
Every time we run this query, the query plan must be recalculated. This can be very expensive for complicated queries.
However, it does have the advantage that no additional decoding is necessary on the DB side, since the AssetIDs are found by the query parser.
The good way: Table-Valued Parameters
SQL Server 2008 adds a new ability: users can define a table-valued database type.
Most other types are scalar (they only return one value), but table-valued types can hold multiple values, as long as the values are tabular.
We've defined three types: varchar_array, int_array, and bigint_array.
CREATE TYPE bigint_array AS TABLE (Id bigint NOT NULL PRIMARY KEY)
Both stored procedures and programmatically defined SQL queries can use these table-valued types.
IEnumerable<long> assetIDs = GetAssetIDs();
return Config.GetDatabase().ExecEnumerableSql(dr=>dr.GetString("Name"),
"SELECT Name FROM Asset WHERE AssetID IN (SELECT Id FROM #AssetIDs)",
new Parameter("#AssetIDs", assetIDs));
Advantages
Can be used in both stored procedures and programmatic SQL without much effort
Not vulnerable to SQL injection
Cacheable, stable queries
Does not lock the schema table
Not limited to 8k of data
Less work done by both DB server and the Mine apps, since there is no concatenation or decoding of CSV strings.
"typical use" statistics can be derived by the query analyzer, which can lead to even better performance.
Disadvantages
Only works on SQL Server 2008 and above.
Rumors that TVP are prebuffered in their entirety before execution of the query, which means phenomenally large TVPs may be rejected by the server.
Further investigation of this rumor is ongoing.
Further reading
This article is a great resource to learn more about TVP.
If you can't use table-valued parameters, this option is a little faster than the xml option while still allowing you to stay away from dynamic sql: pass the joined list of values as a string parameter, and parse the delimited string back to values in your query. please see this article for instructions on how to do the parsing efficiently.
I have a sneaking suspicion that you're on SQL Server 2005. Table-valued parameters weren't added until 2008, but you can still use the XML data type to pass sets between the client and the server.
This works for SQL Server 2005 (and later):
create procedure IGetAListOfValues
#Ids xml -- This will recevie a List of values
as
begin
-- You can load then in a temp table or use it as a subquery:
create table #Ids (Id int);
INSERT INTO #Ids
SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p);
...
end
You have to invoke this procedure with a parameter like this:
exec IGetAListOfValues
#Ids = '<params> <p>1</p> <p>2</p> </params>' -- xml parameter
The nodes function uses an xPath expression. In this case, it's /params/p and that's way the XML uses <params> as root, and <p> as element.
The value function cast the text inside each p element to int, but you can use it with other data types easily. In this sample there is a DISTINCT to avoid repeated values, but, of course, you can remove it depending on what you want to achieve.
I have an auxiliary (extension) method that converts an IEnumerable<T> in a string that looks like the one shown in the execute example. It's easy to create one, and have it do the work for you whenever you need it. (You have to test the data type of T and convert to an adequate string that can be parsed on SQL Server side). This way your C# code is cleaner and your SPs follow the same pattern to receive the parameters (you can pass in as many lists as needed).
One advantage is that you don't need to make anything special in your database for it to work.
Of course, you don't need to create a temp table as it's done in my example, but you can use the query directly as a subquery inside an IN predicate
WHERE MyTableId IN (SELECT DISTINCT params.p.value('.','int')
FROM #Ids.nodes('/params/p') as params(p) )
I am not 100% sure that I understand correctly the problem, but LinqToSql's ExecuteQuery has an overload for parameters, and the query is supposed to use a format similar to string.Format.
Using this overload is safe against SQL injection, and behind the scenes LinqToSql transalets it to use sp_executesql with parameters.
Here is an example:
string sql = "SELECT * FROM city WHERE city LIKE {0}";
db.ExecuteQuery(sql, "Lon%"); //Note that we don't need the single quotes
This way one can use the benefit of parameterized queries, even while using dynamic sql.
However when it comes to using IN with a dynamic number of parameters, there are two options:
Construct the string dynamically, and then pass the values as an array, as in:
string sql = "SELECT * FROM city WHERE zip IN (";
List<string> placeholders = new List<string>();
for(int i = 0; i < zips.Length;i++)
{
placeholders.Add("{"+i.ToString()+"}");
}
sql += string.Join(",",placeholders.ToArray());
sql += ")";
db.ExecuteQuery(sql, zips.ToArray());
We can use a more compact approach by using the Linq extension methods, as in
string sql = "SELECT * FROM city WHERE zip IN ("+
string.Join("," , zips.Select(z => "{" + zips.IndexOf(f).ToString() + "}"))
+")";
db.ExecuteQuery(sql, zips.ToArray());

Creating SQL Server JSON Parsing/Query UDF

First of all before I get into the question, I'll preface this with the fact that I know that this is a "bad" idea. But for business reasons it is something that I have to come up with a solution to, and I'm hoping that someone, somewhere might have some ideas on how to go about this.
I have a SQL Server 2008 R2 table that has a "OtherProperties" column. This column contains various other, somewhat arbitrary additional pieces of information that relate to the records. There is a business need to create a UDF that we can use to query the results, for example.
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "LinkedOrder[0]") IS NOT NULL
This would find a record where there was an array of LinkedOrder entries that contained a value at index 0
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "SubOrder.OrderId") = 25
This would find a property "orderId" and use its value in a comparison.
Anyone seen an implementation of this? I've seen implementations of functions. Like this JSONParser that take the values into a table which just will not get us what we need query wise. Complexity wise, I don't want to write a full fledged JSON parser, but I can if I need to.
Not sure if this will suit your needs but I read about a CLR JSON serializer/deserializer. You can find it here, http://www.sqlservercentral.com/articles/CLR/74160/
It's been a long time since you asked your question but there is now a solution you can use - JSON Select which provides various functions for different datatypes, for example the JsonInt() function. From one of your examples (assuming OrderId is an int, if not you could use a different function):
SELECT *
FROM MyTable
WHERE dbo.JsonInt(myTable.OtherProperties, 'SubOrder.OrderId') = 25
DISCLOSURE:
I am the author of JSON Select, and as such have an interest in you using it :)
If you cannot use SQL Server 2016 with built-in JSON support, you would need to use CLR e.g. JSONselect, json4sql, or custom code such as http://www.codeproject.com/Articles/1000953/JSON-for-SQL-Server-Part, etc.

How can I generate an INSERT script for a table with a VARBINARY(MAX) field?

I have a table with a VARBINARY(MAX) field (SQL Server 2008 with FILESTREAM)
My requirement is that when I go to deploy to production, I can only supply my IT team with a group of SQL scripts to be executed in a certain order. A new table I am making in production has this VARBINARY(MAX) field. Usually with new tables, I will script out the CREATE TABLE script. And, if I have data I need to go with it, I will then script out the INSERT scripts. Not too complicated.
But with VARBINARY(MAX), the Stored Procedure I was using to generate the INSERT statements fails on that table. I tried selecting that field, printing it, copying it, converting to hex, etc. The main issue I have with that is that it doesn't select all the data in the field. I do a check DATALENGTH([FileColumn]) and if the source row contains 1,004,382 bytes, the max I can get the copied or selected data when inserting again is 8000. So basically it is truncated (i.e. invalid) data.....
How can I do this better? I tried Googling this like crazy but I must be missing something. Remember, I can't access the filesystem. This has to be all scripted.
If this is a one time (or seldom) thing to do, you can try scripting the data out from the SSMS Wizard as described here:
http://sqlblog.com/blogs/eric_johnson/archive/2010/03/08/script-data-in-sql-server-2008.aspx
Or, if you need to do this frequently and want to automate it, you can try the SQL# SQLCLR library (which I wrote and while most of it is free, the function you need here is not). The function to do this is DB_DumpData and it also generates INSERT statements.
But again, if this is a one time or infrequent task, then try the data export wizard that is built into Management Studio. That should allow you to then create the SQL script that you can run in Production. I just tested this on a table with a VARBINARY(MAX) field containing 3,365,964 bytes of data and the Generate Scripts wizard generated an INSERT statement with the entire hex string of 6.73 million characters for that one value.
UPDATE:
Another quick and easy way to do this in a manner that would allow you to copy / paste the entire INSERT statement into a SQL script and not have to bother with BCP or SSMS Export Wizard is to just convert the value to XML. First you would CONVERT the VARBINARY to VARCHAR(MAX) using the optional style of "1" which gives you a hex string starting with "0x". Once you have the hex string of the binary data you can concatenate that into an INSERT statement and that entire thing, when converted to XML, can contain the entire VARBINARY field. See the following example:
DECLARE #Binary VARBINARY(MAX) = CONVERT(VARBINARY(MAX),
REPLICATE(
CONVERT(NVARCHAR(MAX), 'test string'),
100000)
)
SELECT 'INSERT INTO dbo.TableName (ColumnName) VALUES ('+
CONVERT(VARCHAR(MAX), #Binary, 1) + ')' AS [Insert]
FOR XML RAW;
Don't script from SSMS
bcp the data out/in, or use something like SSMS tools to generate INSERT statements
It more than a bit messed up, but in the past and on the web I've seen this done using a base64-encoded string. You use an xml value to wrap the string and from there you can convert it to a varbinary. Here's an example:
http://blogs.msdn.com/b/sqltips/archive/2008/06/30/converting-from-base64-to-varbinary-and-vice-versa.aspx
I can't speak personally to how effective or performant this is, though, especially for large values. Because it is at best an ugly hack, I'd tuck it away inside a UDF somewhere, so that if a better method is found you can update it easily.
I have never tried anything like this before, but from the documentation for SQL Server 2008 R2, it sounds like using SUBSTRING will work to get the entire varbinary value, although you may have to work with it in chunks, using UPDATEs with the .WRITE clause to append the data.
Updating Large Value Data Types
Use the .WRITE (expression, #Offset, #Length) clause to perform a partial or full update of varchar(max), nvarchar(max), and varbinary(max) data types. For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column.
For best performance, we recommend that data be inserted or updated in chunk sizes that are multiples of 8040 bytes.
Hope this helps.

Resources