Large sets of sql parameters in query - sql-server

I have two disconnected sql servers that have to have correlated queries run between them. What is the best way to run a query such as:
select * from table where id in (1..100000)
Where the 1..100000 are ids I'm getting from the other database and are not contiguous.
The in clause doesn't support that many parameters, and creating a temp table to do a subquery on takes forever. Are there any other options? Using Sql Server 2005 as the DB, C# as my lang.
Linking the servers is not an option.

If possible, set them up as linked servers. Then you can query the other server directly.
Once you have your link setup, you should also consider that an INNER JOIN or EXISTS will likely perform better.

Syntax might be off slightly, as my server to server MSSQL is rusty, but...
Select * from table where id in (select id from [Server_Two\Some_Instance].[SomeDatabase].[user].table2)

To work around the number of IN parameters allowed without querying across servers, you can bucket them into multiple queries with subsets of the ids and connect them with a UNION. Kinda kludgy, but it should work.

You could use a function to break down the input string and return a table. There are plenty of questions on here about how to have dynamic parameters with in clauses which should have an example.
If you can link your servers you could join between the two servers.

The other option which it sounds like you've explored is creating a temp table with the id's that are going to be used as criteria for a join to the primary table.
select * from atable a
inner join #temptable t on a.id = t.id
Since they're ID's I'm assuming they are indexed.

How are you generating the in? If it is text, you can generate it differently. Or does this cause the same error?
SELECT.....
WHERE id in (1..10000)
OR id in (10001..20000)
-- etc.

Related

Index for using IN clause in where condition

My application access data from a table in SQL Server. Consider the table name is PurchaseDetail with some other columns.
The select query has below where clauses.
1. name - name has 10000 values only.
2. createdDateTime
The actual query is
select *
from PurchaseDetail
where name in (~2000 name)
and createdDateTime = 'someDateValue';
The SQL tuning advisor gave some recommendation. I tried with those recommended indexes. The performance increased a bit but not completely.
Is there any wrong in my query? or Is there any possible to change/improve my select query?
Because I didn't use IN in where clause before. My table having more than 100 million records.
Any suggestion please?
In this case using IN for that much data is not good at all.
this best way is to use INNER JOIN instead.
It would be nicer if insert those names into a temp table and INNER JOIN it with your SELECT query.

How can I reference tables from multiple databases within the same server in a common table expression (CTE)?

I have a SQL Server 2014 Express with multiple databases. One of them has general tables with information common to the remaining databases (let's call this database UniversalData).
The other databases have information that is pertinent to a specific site (let's call one of these databases Site01Data). The universal data may change and I don't want to replicate it regularly to the other site-specific databases, so I want to include the UniversalData table in some queries, some of which involve CTEs.
What I am trying to accomplish:
WITH CTE1 AS
(
SELECT *
FROM UniversalData.dbo.someTable
),
CTE2 AS
(
SELECT *
FROM Site01Data.dbo.anotherTable
),
CTE3 AS
(
SELECT CTE1.field1, CTE2.field2
FROM CTE1
JOIN CTE2 ON CTE1.idx = CTE2.idx
)
SELECT *
FROM CTE3;
This doesn't generate an error, but I seem to get no data from the CTE1 in my final query (null result set). Intuitively, does this mean it is saving a temp table in the UniversalData database that is not accessible from the Site01Data database?
How can I use a CTE with tables from different databases on the same server?
There are lots of ways to do this..
You could read the tables in one database into a temp table on the second database and then join to it.. or join both of them on the fly.
but first.. refrain from doing select *.. specify the columns
You could go
select t1.column1,t2.column2
from UniversalData.dbo.someTable t1
inner join Site01Data.dbo.anotherTable t2
on t2.ida = t2.idx
and so onn.. it depends on which way you want to specify the join and what sort of join you want to choose..
This assumes that both the data bases are on the same instance.. else you will need linked servers
Specify servername.site1data.dbo.table etc and use linked servers if appropriate across different servernames

Joining tables from two Oracle databases in SAS

I am joining two tables together that are located in two separate oracle databases.
I am currently doing this in sas by creating two libname connections to each database and then simply using something like the below.
libname dbase_a oracle user= etc... ;
libname dbase_b oracle user= etc... ;
proc sql;
create table t1 as
select a.*, b.*
from dbase_a.table1 a inner join dbase_b.table2 b
on a.id = b.id;
quit;
However the query is painfully slow. Can you suggest any better options to speed up such a query (short of creating a database link going down the path of creating a database link)?
Many thanks for looking at this.
If those two databases are on the same server and you are able to execute cross-database queries in Oracle, you could try using SQL pass-through:
proc sql;
connect to oracle (user= password= <...>);
create table t1 as
select * from connection to oracle (
select a.*, b.*
from dbase_a.schema_a.table1 a
inner join dbase_b.schema_b.table2 b
on a.id = b.id;
);
disconnect from oracle;
quit;
I think that, in most cases, SAS attemps as much as possible to have the query executed on the database server, even if pass-through was not explicitely specified. However, when that query queries tables that are on different servers, different databases on a system that does not allow cross-database queries or if the query contains SAS-specific functions that SAS is not able to translate in something valid on the DBMS system, then SAS will indeed resort to 'downloading' the complete tables and processing the query locally, which can evidently be painfully inefficient.
The select is for all columns from each table, and the inner join is on the id values only. Because the join criteria evaluation is for data coming from disparate sources, the baggage of all columns could be a big factor in the timing because even non-match rows must be downloaded (by the libname engine, within the SQL execution context) during the ON evaluation.
One approach would be to:
Select only the id from each table
Find the intersection
Upload the intersection to each server (as a scratch table)
Utilize the intersection on each server as pass through selection criteria within the final join in SAS
There are a couple variations depending on the expected number of id matches, the number of different ids in each table, or knowing table-1 and table-2 as SMALL and BIG. For a large number of id matches that need transfer back to a server you will probably want to use some form of bulk copy. For a relative small number of ids in the intersection you might get away with enumerating them directly in a SQL statement using the construct IN (). The size of a SQL statement could be limited by the database, the SAS/ACCESS to ORACLE engine, the SAS macro system.
Consider a data scenario in which it has been determined the potential number of matching ids would be too large for a construct in (id-1,...id-n). In such a case the list of matching ids are dealt with in a tabular manner:
libname SOURCE1 ORACLE ....;
libname SOURCE2 ORACLE ....;
libname SCRATCH1 ORACLE ... must specify a scratch schema ...;
libname SCRATCH2 ORACLE ... must specify a scratch schema ...;
proc sql;
connect using SOURCE1 as PASS1;
connect using SOURCE2 as PASS2;
* compute intersection from only id data sent to SAS;
create table INTERSECTION as
(select id from connection to PASS1 (select id from table1))
intersect
(select id from connection to PASS2 (select id from table2))
;
* upload intersection to each server;
create table SCRATCH1.ids as select id from INTERSECTION;
create table SCRATCH2.ids as select id from INTERSECTION;
* compute inner join from only data that matches intersection;
create table INNERJOIN as select ONE.*, TWO.* from
(select * from connection to PASS1 (
select * from oracle-path-to-schema.table1
where id in (select id from oracle-path-to-scratch.ids)
))
JOIN
(select * from connection to PASS2 (
select * from oracle-path-to-schema.table2
where id in (select id from oracle-path-to-scratch.ids)
));
...
For the case of both table-1 and table-2 having very large numbers of ids that exceed the resource capacity of your SAS platform you will have to also iterate the approach for ranges of id counts. Techniques for range criteria determination for each iteration is a tale for another day.

Nested pass-through queries?

I have an ODBC connection to a SQL Server database, and because I'm returning large record sets with my queries, I've found that it's faster to run pass-through queries than native Access queries.
But I'm finding it hard to write and organize my queries because, as far as I know, I can't save several different pass-through queries and join them in another pass-through query. I have read-only access to this database, so I can't save stored procedures in SQL Server and then reference them in the pass-through.
For example, suppose I want to get only those entries with the maximum value of o_version from the following query:
select d.o_filename,d.o_version,parent.o_projectname
from dms_doc d
left join
dms_proj p
on
d.o_projectno=p.o_projectno
left join
dms_proj parent
on
p.o_parentno=parent.o_projectno
where
p.o_projectname='ABC'
and
lower(left(right(d.o_filename,4),3))='xls'
and
charindex('xyz',lower(d.o_filename))=0
I want to get only those entries with the maximum value of d.o_version. Ordinarily I would save this as a query called, e.g., abc, and then write another query abcMax:
select * from abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from abc
group by o_filename,o_projectname) abc2
on
abc.o_filename=abc2.o_filename
and
abc.o_projectname=abc2.o_projectname
where
abc.o_version=abc2.maxVersion
But if I can't store abc as a query that can be used in the pass-through query abcMax, then not only do I have to copy the entire body of abc into abcMax several times, but if I make any changes to the content of abc, then I need to make them to every copy that's embedded in abcMax.
The alternative is to write abcMax as a regular Access query that calls abc, but that will reduce the performance because the query is now being handled by ACE instead of SQL Server.
Is there any way to nest stored pass-through queries in Access? Or is creating stored procedures in SQL Server the only way to accomplish this?
If you have (or can get) permission to create temporary tables on the SQL Server then you might be able to use them to some advantage. For example, you could run one pass-through query to create a temporary table with the results from the first query (vastly simplified, in this example):
CREATE TABLE #abc (o_filename NVARCHAR(50), o_version INT, o_projectname NVARCHAR(50));
INSERT INTO #abc SELECT o_filename, o_version, o_projectname FROM dms_doc;
and then your second pass-through query could just reference the temporary table
select * from #abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from #abc
group by o_filename,o_projectname) abc2
on
#abc.o_filename=abc2.o_filename
and
#abc.o_projectname=abc2.o_projectname
where
#abc.o_version=abc2.maxVersion
When you're finished you can run a pass-through query to explicitly delete the temporary table
DROP TABLE #abc
or SQL Server will delete it for you automatically when your connection to the SQL Server closes.
For anyone still needing this info:
Pass through queries allow for the use of cte queries as can be used with Oracle SQL. Similar to creating multiple select queries, but much faster and efficient, without the clutter and confusion of “stacked” Select queries since you can see all the underlying queries in one view.
Example:
With Prep AS (
SELECT A.name,A.city
FROM Customers AS A
)
SELECT P.city, COUNT(P.name) AS clients_per_city
FROM Prep AS P
GROUP BY P.city

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

Resources