Nested pass-through queries? - sql-server

I have an ODBC connection to a SQL Server database, and because I'm returning large record sets with my queries, I've found that it's faster to run pass-through queries than native Access queries.
But I'm finding it hard to write and organize my queries because, as far as I know, I can't save several different pass-through queries and join them in another pass-through query. I have read-only access to this database, so I can't save stored procedures in SQL Server and then reference them in the pass-through.
For example, suppose I want to get only those entries with the maximum value of o_version from the following query:
select d.o_filename,d.o_version,parent.o_projectname
from dms_doc d
left join
dms_proj p
on
d.o_projectno=p.o_projectno
left join
dms_proj parent
on
p.o_parentno=parent.o_projectno
where
p.o_projectname='ABC'
and
lower(left(right(d.o_filename,4),3))='xls'
and
charindex('xyz',lower(d.o_filename))=0
I want to get only those entries with the maximum value of d.o_version. Ordinarily I would save this as a query called, e.g., abc, and then write another query abcMax:
select * from abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from abc
group by o_filename,o_projectname) abc2
on
abc.o_filename=abc2.o_filename
and
abc.o_projectname=abc2.o_projectname
where
abc.o_version=abc2.maxVersion
But if I can't store abc as a query that can be used in the pass-through query abcMax, then not only do I have to copy the entire body of abc into abcMax several times, but if I make any changes to the content of abc, then I need to make them to every copy that's embedded in abcMax.
The alternative is to write abcMax as a regular Access query that calls abc, but that will reduce the performance because the query is now being handled by ACE instead of SQL Server.
Is there any way to nest stored pass-through queries in Access? Or is creating stored procedures in SQL Server the only way to accomplish this?

If you have (or can get) permission to create temporary tables on the SQL Server then you might be able to use them to some advantage. For example, you could run one pass-through query to create a temporary table with the results from the first query (vastly simplified, in this example):
CREATE TABLE #abc (o_filename NVARCHAR(50), o_version INT, o_projectname NVARCHAR(50));
INSERT INTO #abc SELECT o_filename, o_version, o_projectname FROM dms_doc;
and then your second pass-through query could just reference the temporary table
select * from #abc
inner join
(select o_filename,o_projectname,max(o_version) as maxVersion from #abc
group by o_filename,o_projectname) abc2
on
#abc.o_filename=abc2.o_filename
and
#abc.o_projectname=abc2.o_projectname
where
#abc.o_version=abc2.maxVersion
When you're finished you can run a pass-through query to explicitly delete the temporary table
DROP TABLE #abc
or SQL Server will delete it for you automatically when your connection to the SQL Server closes.

For anyone still needing this info:
Pass through queries allow for the use of cte queries as can be used with Oracle SQL. Similar to creating multiple select queries, but much faster and efficient, without the clutter and confusion of “stacked” Select queries since you can see all the underlying queries in one view.
Example:
With Prep AS (
SELECT A.name,A.city
FROM Customers AS A
)
SELECT P.city, COUNT(P.name) AS clients_per_city
FROM Prep AS P
GROUP BY P.city

Related

Joining tables from two Oracle databases in SAS

I am joining two tables together that are located in two separate oracle databases.
I am currently doing this in sas by creating two libname connections to each database and then simply using something like the below.
libname dbase_a oracle user= etc... ;
libname dbase_b oracle user= etc... ;
proc sql;
create table t1 as
select a.*, b.*
from dbase_a.table1 a inner join dbase_b.table2 b
on a.id = b.id;
quit;
However the query is painfully slow. Can you suggest any better options to speed up such a query (short of creating a database link going down the path of creating a database link)?
Many thanks for looking at this.
If those two databases are on the same server and you are able to execute cross-database queries in Oracle, you could try using SQL pass-through:
proc sql;
connect to oracle (user= password= <...>);
create table t1 as
select * from connection to oracle (
select a.*, b.*
from dbase_a.schema_a.table1 a
inner join dbase_b.schema_b.table2 b
on a.id = b.id;
);
disconnect from oracle;
quit;
I think that, in most cases, SAS attemps as much as possible to have the query executed on the database server, even if pass-through was not explicitely specified. However, when that query queries tables that are on different servers, different databases on a system that does not allow cross-database queries or if the query contains SAS-specific functions that SAS is not able to translate in something valid on the DBMS system, then SAS will indeed resort to 'downloading' the complete tables and processing the query locally, which can evidently be painfully inefficient.
The select is for all columns from each table, and the inner join is on the id values only. Because the join criteria evaluation is for data coming from disparate sources, the baggage of all columns could be a big factor in the timing because even non-match rows must be downloaded (by the libname engine, within the SQL execution context) during the ON evaluation.
One approach would be to:
Select only the id from each table
Find the intersection
Upload the intersection to each server (as a scratch table)
Utilize the intersection on each server as pass through selection criteria within the final join in SAS
There are a couple variations depending on the expected number of id matches, the number of different ids in each table, or knowing table-1 and table-2 as SMALL and BIG. For a large number of id matches that need transfer back to a server you will probably want to use some form of bulk copy. For a relative small number of ids in the intersection you might get away with enumerating them directly in a SQL statement using the construct IN (). The size of a SQL statement could be limited by the database, the SAS/ACCESS to ORACLE engine, the SAS macro system.
Consider a data scenario in which it has been determined the potential number of matching ids would be too large for a construct in (id-1,...id-n). In such a case the list of matching ids are dealt with in a tabular manner:
libname SOURCE1 ORACLE ....;
libname SOURCE2 ORACLE ....;
libname SCRATCH1 ORACLE ... must specify a scratch schema ...;
libname SCRATCH2 ORACLE ... must specify a scratch schema ...;
proc sql;
connect using SOURCE1 as PASS1;
connect using SOURCE2 as PASS2;
* compute intersection from only id data sent to SAS;
create table INTERSECTION as
(select id from connection to PASS1 (select id from table1))
intersect
(select id from connection to PASS2 (select id from table2))
;
* upload intersection to each server;
create table SCRATCH1.ids as select id from INTERSECTION;
create table SCRATCH2.ids as select id from INTERSECTION;
* compute inner join from only data that matches intersection;
create table INNERJOIN as select ONE.*, TWO.* from
(select * from connection to PASS1 (
select * from oracle-path-to-schema.table1
where id in (select id from oracle-path-to-scratch.ids)
))
JOIN
(select * from connection to PASS2 (
select * from oracle-path-to-schema.table2
where id in (select id from oracle-path-to-scratch.ids)
));
...
For the case of both table-1 and table-2 having very large numbers of ids that exceed the resource capacity of your SAS platform you will have to also iterate the approach for ranges of id counts. Techniques for range criteria determination for each iteration is a tale for another day.

Use result of stored procedure to join to a table

I have a stored procedure that returns a dataset from a dynamic pivot query (meaning the pivot columns aren't know until run-time because they are driven by data).
The first column in this dataset is a product id. I want to join that product id with another product table that has all sorts of other columns that were created at design time.
So, I have a normal table with a product id column and I have a "dynamic" dataset that also has a product id column that I get from calling a stored procedure. How can I inner join those 2?
Dynamic SQL is very powerfull, but has some severe draw backs. One of them is exactly this: You cannot use its result in ad-hoc-SQL.
The only way to get the result of a SP into a table is, to create a table with a fitting schema and use the INSERT INTO NewTbl EXEC... syntax...
But there are other possibilities:
1) Use SELECT ... INTO ... FROM
Within your SP, when the dynamic SQL is executed, you could add INTO NewTbl to your select:
SELECT Col1, Col2, [...] INTO NewTbl FROM ...
This will create a table with the fitting schema automatically.
You might even hand in the name of the new table as a paramter - as it is dynamic SQL, but in this case it will be more difficult to handle the join outside (must be dynamic again).
If you need your SP to return the result, you just add SELECT * FROM NewTbl. This will return the same resultset as before.
Outside your SP you can join this table as any normal table...
BUT, there is a big BUT - ups - this sounds nasty somehow - This will fail, if the tabel exists...
So you have to drop it first, which can lead into deep troubles, if this is a multi-user application with possible concurrencies.
If not: Use IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='NewTbl') DROP TABLE NewTbl;
If yes: Create the table with a name you pass in as parameter and do you external query dynamically with this name.
After this you can re-create this table using the SELECT ... INTO syntax...
2) Use XML
One advantage of XML is the fact, that any structure and any amount of data can be stuffed into one single column.
Let your SP return a table with one single XML column. You can - as you know the schema now - create a table and use INSERT INTO XmlTable EXEC ....
Knowing, that there will be a ProductID-element you can extract this value and create a 2-column-derived-table with the ID and the depending XML. This is easy to join.
Using wildcards in XQuery makes it possible to query XML data without knowing all the details...
3) This was my favourite: Don't use dynamic queries...

sql server linked server to oracle returns no data found when data exists

I have a linked server setup in SQL Server to hit an Oracle database. I have a query in SQL Server that joins on the Oracle table using dot notation. I am getting a “No Data Found” error from Oracle. On the Oracle side, I am hitting a table (not a view) and no stored procedure is involved.
First, when there is no data I should just get zero rows and not an error.
Second, there should actually be data in this case.
Third, I have only seen the ORA-01403 error in PL/SQL code; never in SQL.
This is the full error message:
OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE" returned message "ORA-01403: no data found".
Msg 7346, Level 16, State 2, Line 1
Cannot get the data of the row from the OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE".
Here are some more details, but it probably does not mean anything since you don’t have my tables and data.
This is the query with the problem:
select *
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I can’t understand why I get a “no data found” error. The query below uses the same Oracle table and returns no data but I don’t get an error - I just get no rows returned.
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID = -1
The query below returns data. I just removed one of the SQL Server tables from the join. But removing the batch table does not change the rows returned from batchFile (271 rows in both cases – all rows in batchFile have a batch entry). It should still be joining the same batchFile rows to the same Oracle rows.
select *
from eopf.BatchFile bf
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
And this query returns 5 rows. It should be the same 5 from the original query. ( I can’t use this because I need data from the batch and batchFile table).
select *
from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
where du.documentUploadId
in
(
select bf.ReferenceID
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID);
Has anyone experienced this error?
Today I experienced the same problem with an inner Join. As creating a Table Valued Function suggested by codechurn or using a Temporary Table suggested by user1935511 or changing the Join Types suggested by cymorg are no options for me, I like to share my solution.
I used Join Hints to drive the query optimizer into the right direction, as the problem seems to rise up from nested loops join strategy with the remote table locally . For me HASH, MERGE and REMOTE join hints worked.
For you REMOTE will not be an option because it can be used only for inner join operations. So using something like the following should work.
select *
from eopf.Batch b
join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer merge join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I've had the same problem.
Solution1: load the data from the Oracle database into a temp table, then join to that temp table instead - here's a link.
From this post a link you can find out that the problem can be with using left join.
I've checked with my problem and after changing my query it solved the problem.
In my case I had a complex view made from a linked table, 3 views based on the linked table and a local table. I was using Inner Joins throughout and this problem manifested. Changing the joins to Left and Right Outer Joins (as appropriate) resolved the issue.
Another way to work around the problem is to pull back the Oracle data into a Table Valued Function. This will cause SQL Server to go out and retrieve all of the data from Oracle and throw it into a resultant table variable. For all intent and purpose, the Oracle data is now "local" to SQL Server if you use the resultant Table Valued Function in a query.
I believe the original problem is that SQL Server is trying to optimize the execution of your compound query which includes the remote Oracle query results in-line. By using a Table Valued Function to wrap the Oracle call, SQL Server will optimize the compound query on the resultant table variable returned from the function and not the results from the remote query execution.
CREATE function [dbo].[documents]()
returns #results TABLE (
DOCUMENT_ID INT NOT NULL,
TITLE VARCHAR(6) NOT NULL,
LEGALNAME VARCHAR(50) NOT NULL,
AUTHOR_ID INT NOT NULL,
DOCUMENT_TYPE VARCHAR(1) NOT NULL,
LAST_UPDATE DATETIME
) AS
BEGIN
INSERT INTO #results
SELECT CAST(DOCUMENT_ID AS INT) AS DOCUMENT_ID, TITLE, LEGALNAME, CAST(AUTHOR_ID AS INT) AS AUTHOR_ID, DOCUMENT_TYPE, LAST_UPDATE
FROM OPENQUERY(ORACLE_SERVER,
'select DOCUMENT_ID, TITLE, LEGALNAME, AUTHOR_ID, DOCUMENT_TYPE, FUNDTYPE, LAST_UPDATE
from documents')
return
END
You can then use the Table Valued Function as it it were a table in your SQL queries:
SELECT * FROM DOCUMENTS()
I resolved it by avoiding the = operator. Try using this instead:
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID < 0

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

Large sets of sql parameters in query

I have two disconnected sql servers that have to have correlated queries run between them. What is the best way to run a query such as:
select * from table where id in (1..100000)
Where the 1..100000 are ids I'm getting from the other database and are not contiguous.
The in clause doesn't support that many parameters, and creating a temp table to do a subquery on takes forever. Are there any other options? Using Sql Server 2005 as the DB, C# as my lang.
Linking the servers is not an option.
If possible, set them up as linked servers. Then you can query the other server directly.
Once you have your link setup, you should also consider that an INNER JOIN or EXISTS will likely perform better.
Syntax might be off slightly, as my server to server MSSQL is rusty, but...
Select * from table where id in (select id from [Server_Two\Some_Instance].[SomeDatabase].[user].table2)
To work around the number of IN parameters allowed without querying across servers, you can bucket them into multiple queries with subsets of the ids and connect them with a UNION. Kinda kludgy, but it should work.
You could use a function to break down the input string and return a table. There are plenty of questions on here about how to have dynamic parameters with in clauses which should have an example.
If you can link your servers you could join between the two servers.
The other option which it sounds like you've explored is creating a temp table with the id's that are going to be used as criteria for a join to the primary table.
select * from atable a
inner join #temptable t on a.id = t.id
Since they're ID's I'm assuming they are indexed.
How are you generating the in? If it is text, you can generate it differently. Or does this cause the same error?
SELECT.....
WHERE id in (1..10000)
OR id in (10001..20000)
-- etc.

Resources