SQL Server Collation to Match SSIS Ordering

SQL Server Collation to Match SSIS Ordering - sql-server

Example values from a SQL Server data set as they appear after an ORDER BY clause:
X0000000-2009
X000000-1-2010
X0000001-2010
If I use ORDER BY Field COLLATE Latin1_General_bin they come out slightly differently as:
X000000-1-2010
X0000000-2009
X0000001-2010
I'm looking to use this data in a Merge Join transformation in SSIS without the need for a Sort transformation, however, a Sort transformation will order them as:
X0000000-2009
X0000001-2010
X000000-1-2010
This is problematic because I need to match the SSIS ordering in my SQL source query in order for the Merge Join to work properly.
Is there a collation I can use in my ORDER BY that is guaranteed to match the SSIS ordering?

Related

Select from one table where not in another in SQL Server

I have a SQL Server database on my computer, and there are two tables in it.
This is the first one:
SELECT
[ParticipantID]
,[ParticipantName]
,[ParticipantNumber]
,[PhoneNumber]
,[Mobile]
,[Email]
,[Address]
,[Notes]
,[IsDeleted]
,[Gender]
,[DOB]
FROM
[Gym].[dbo].[Participant]
and this is the second one
SELECT
[ParticipationID]
,[ParticipationNumber]
,[ParticpationTypeID]
,[AddedByEmployeeID]
,[AddDate]
,[ParticipantID]
,[TrainerID]
,[ParticipationDate]
,[EndDate]
,[Fees]
,[PaidFees]
,[RemainingFees]
,[IsPeriodParticipation]
,[NoOfVisits]
,[Notes]
,[IsDeleted]
FROM
[Gym].[dbo].[Participation]
Now I need to write a T-SQL query that can return
SELECT
Participant.ParticipantNumber,
Participation.ParticipationDate,
Participation.EndDate
FROM
Participation
WHERE
Participant.ParticipantID = Participation.ParticipantID;
and I'm going to be thankful

SQL Server performs sort, intersect, union, and difference operations using in-memory sorting and hash join technology. Using this type of query plan, SQL Server supports vertical table partitioning, sometimes called columnar storage.
SQL Server employs three types of join operations:
Nested Loops joins
Merge joins
Hash joins
Join Fundamentals
By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. Joins indicate how Microsoft SQL Server should use data from one table to select the rows in another table.
A join condition defines the way two tables are related in a query by:
Specifying the column from each table to be used for the join. A typical join condition specifies a foreign key from one table and its associated key in the other table.
Specifying a logical operator (for example, = or <>,) to be used in comparing values from the columns.
Inner joins can be specified in either the FROM or WHERE clauses. Outer joins can be specified in the FROM clause only. The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause.
Follow this link to help you understand joins better in mssql:
link to joins

Nested pass-through queries?

I have an ODBC connection to a SQL Server database, and because I'm returning large record sets with my queries, I've found that it's faster to run pass-through queries than native Access queries.
But I'm finding it hard to write and organize my queries because, as far as I know, I can't save several different pass-through queries and join them in another pass-through query. I have read-only access to this database, so I can't save stored procedures in SQL Server and then reference them in the pass-through.
For example, suppose I want to get only those entries with the maximum value of o_version from the following query:
select d.o_filename,d.o_version,parent.o_projectname
from dms_doc d
left join
dms_proj p
on
d.o_projectno=p.o_projectno
left join
dms_proj parent
on
p.o_parentno=parent.o_projectno
where
p.o_projectname='ABC'
and
lower(left(right(d.o_filename,4),3))='xls'
and
charindex('xyz',lower(d.o_filename))=0
I want to get only those entries with the maximum value of d.o_version. Ordinarily I would save this as a query called, e.g., abc, and then write another query abcMax:
select * from abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from abc
group by o_filename,o_projectname) abc2
on
abc.o_filename=abc2.o_filename
and
abc.o_projectname=abc2.o_projectname
where
abc.o_version=abc2.maxVersion
But if I can't store abc as a query that can be used in the pass-through query abcMax, then not only do I have to copy the entire body of abc into abcMax several times, but if I make any changes to the content of abc, then I need to make them to every copy that's embedded in abcMax.
The alternative is to write abcMax as a regular Access query that calls abc, but that will reduce the performance because the query is now being handled by ACE instead of SQL Server.
Is there any way to nest stored pass-through queries in Access? Or is creating stored procedures in SQL Server the only way to accomplish this?

If you have (or can get) permission to create temporary tables on the SQL Server then you might be able to use them to some advantage. For example, you could run one pass-through query to create a temporary table with the results from the first query (vastly simplified, in this example):
CREATE TABLE #abc (o_filename NVARCHAR(50), o_version INT, o_projectname NVARCHAR(50));
INSERT INTO #abc SELECT o_filename, o_version, o_projectname FROM dms_doc;
and then your second pass-through query could just reference the temporary table
select * from #abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from #abc
group by o_filename,o_projectname) abc2
on
#abc.o_filename=abc2.o_filename
and
#abc.o_projectname=abc2.o_projectname
where
#abc.o_version=abc2.maxVersion
When you're finished you can run a pass-through query to explicitly delete the temporary table
DROP TABLE #abc
or SQL Server will delete it for you automatically when your connection to the SQL Server closes.

For anyone still needing this info:
Pass through queries allow for the use of cte queries as can be used with Oracle SQL. Similar to creating multiple select queries, but much faster and efficient, without the clutter and confusion of “stacked” Select queries since you can see all the underlying queries in one view.
Example:
With Prep AS (
SELECT A.name,A.city
FROM Customers AS A
)
SELECT P.city, COUNT(P.name) AS clients_per_city
FROM Prep AS P
GROUP BY P.city

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.

You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)

Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes

As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

Joining null in SQL Server, Oracle and informatica

I have two tables to join with a column (say emp_id).. if emp_id in both the tables have null values, how will SQL Server and Oracle treat???
Coz, I read that informatica will neglect the NULL rows when joining..if I handle the null, by substituting -1, a cross-join will happen which i don't want..
What can I do here?
I cannot completely neglect the rows which has NULL.
Thanks

Perhaps you want a left outer join? See wikipedia
Here's how you do it with Oracle
Here's the SQL Server documentation for left outer join.

You can't join on colA = colB and expect NULLs to compare as equal. Depending on your needs (assuming perhaps some sort of table synchronisation need below) three approaches I can think of are
Use COALESCE to substitute a value such as -1 in place of null if a suitable value exists that can never occur in your actual data. COALESCE(Table1.colA,-1) = COALESCE(Table2.colB,-1)
Use both an IS NULL and equality check on all joining columns.
Use INTERSECT (nulls will be treated as equal). Possibly in a derived table that you can JOIN back onto.

Large sets of sql parameters in query

I have two disconnected sql servers that have to have correlated queries run between them. What is the best way to run a query such as:
select * from table where id in (1..100000)
Where the 1..100000 are ids I'm getting from the other database and are not contiguous.
The in clause doesn't support that many parameters, and creating a temp table to do a subquery on takes forever. Are there any other options? Using Sql Server 2005 as the DB, C# as my lang.
Linking the servers is not an option.

If possible, set them up as linked servers. Then you can query the other server directly.
Once you have your link setup, you should also consider that an INNER JOIN or EXISTS will likely perform better.

Syntax might be off slightly, as my server to server MSSQL is rusty, but...
Select * from table where id in (select id from [Server_Two\Some_Instance].[SomeDatabase].[user].table2)

To work around the number of IN parameters allowed without querying across servers, you can bucket them into multiple queries with subsets of the ids and connect them with a UNION. Kinda kludgy, but it should work.

You could use a function to break down the input string and return a table. There are plenty of questions on here about how to have dynamic parameters with in clauses which should have an example.
If you can link your servers you could join between the two servers.

The other option which it sounds like you've explored is creating a temp table with the id's that are going to be used as criteria for a join to the primary table.
select * from atable a
inner join #temptable t on a.id = t.id
Since they're ID's I'm assuming they are indexed.

How are you generating the in? If it is text, you can generate it differently. Or does this cause the same error?
SELECT.....
WHERE id in (1..10000)
OR id in (10001..20000)
-- etc.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server Collation to Match SSIS Ordering - sql-server

Related

Select from one table where not in another in SQL Server

Nested pass-through queries?

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

Joining null in SQL Server, Oracle and informatica

Large sets of sql parameters in query

Categories

Resources