So I have two queries that I'm working on, one comes from an Oracle DB and the other SQL Server DB. I'm trying to use PowerBI via Power Query as the cross over between the two. Because of the size of the Oracle DB I'm having a problem with running it, so my thought is to use one query as a clause/sub-query of the other to limit the number of results.
Based on the logic of MSFT's M language I'd assume there's a way to do sub-queries of another but I've yet to figure it out. Does anyone have any ideas on how to do this?
What I have learned to do is create a connection to the two under lying data sets, but do not load them. Then you can merge them, but do not use the default Table.NestedJoin().
After the PQ generates this, change it to:
= Table.Join(dbo_SCADocument,{"VisitID"},VISIT_KEY,{"VisitID"}) .
Also remove the trailing name. The reason is , it keeps query folding alive. For some reason, Table.NestedJoin() kills query folding. Note, if there are similar fields in the two sources other than the join it will fail.
Also it brings everything from both sources, but that is easy to alter . Also you will need to turn off the function firewall as this does not allow you to join potential sensitive data with non sensitive data. This is setting your privacy level to ignore all.
I would attempt this using the Merge command. I'm more of a UI guy, so I would click the Merge button. This generates a PQL statement for Table.Join
https://msdn.microsoft.com/en-us/library/mt260788.aspx
Setting Join Kind to Inner will restrict the output to matching rows.
I say attempt as your proposed query design will likely slow your query, not improve it. I would expect PQ to run both queries against the 2 servers and download all their data, then attempt the Join in Memory.
Related
I have a query that looks essentially like this:
SELECT *
FROM [TheDB].[dbo].[TheTable] [T1]
INNER JOIN [TheDB].[dbo].[TheTable] [T2] ON [T1].[Col] = [T2].[Col]
WHERE
[T1].[ID] > [T2].[ID]
AND [T1].[Type] = 'X'
AND [T2].[Type] = 'X'
A fairly simple query designed to find some cases where we have some duplicate data. (The > is used rather than <> just to keep from having the instances show up twice.)
We're explicitly naming the DB, schema, and table in each case. Seems fine.
The odd thing is that if my SSMS is pointing to the DB in question, TheDB, the query never returns...it just spins and spins. If, on the other hand, I point my SSMS to a different DB (changing it from, say, TheDB to TheOtherDB) or if I prefix my query with a line like USE TheOtherDB, then it returns instantly.
This is very repeatable, across restarts of SSMS, and for different users.
What's going on? I'm guessing there's a wildly wrong optimization plan or something that is being used in some cases but not others...but the whole thing seems very strange...that pointing to the DB I'm actually using makes it slower.
To answer some of the questions/comments in the comments:
What's the goal of the query? To find 'duplicate' rows. Do I need to use *? No...and in the real query I'm not. It's just part of the query anonymization. I can change it to [T1].[ID] with the same result.
Could be a bad cached plan...(any trivial modification to the query text should do, really) I can change the query in various ways (like described above, returning just the [T1].[ID] for example and continue to get the same results.
Perhaps the context databases have different compatibility levels...? In this case, both DBs are set to have a compatibility level of "Sql Server 2019 (150)".
I am trying to run this query
$claims = Claim::wherein('policy_id', $userPolicyIds)->where('claim_settlement_status', 'Accepted')->wherebetween('intimation_date', [$startDate, $endDate])->get();
Here, $userPolicyIds can have thousands of policy ids. Is there any way I can increase the maximum number of parameters in SQL server? If not, could anyone help me find a way to solve this issue?
The wherein method creates an SQL fragment of the form WHERE policy_id IN (userPolicyIds[0], userPolicyIds[1], userPolicyIds[2]..., userPolicyIds[MAX]). In other words, the entire collection is unwrapped into the SQL statement. The result is a HUGE SQL statement that SQL Server refuses to execute.
This is a well known limitation of Microsoft SQL Server. And it is a hard limit, because there appears to be no option for changing it. But SQL Server can hardly be blamed for having this limit, because trying to execute a query with as many as 2000 parameters is an unhealthy situation that you should not have put yourself into in the first place.
So, even if there was a way to change the limit, it would still be advisable to leave the limit as it is, and restructure your code instead, so that this unhealthy situation does not arise.
You have at least a couple of options:
Break your query down to batches of, say, 2000 items each.
Add your fields into a temporary table and make your query join that table.
Personally, I would go with the second option, since it will perform much better than anything else, and it is arbitrarily scalable.
I solved this problem by running this raw query
SELECT claims.*, policies.* FROM claims INNER JOIN policies ON claims.policy_id = policies.id
WHERE policy_id IN $userPolicyIds AND claim_settlement_status = 'Accepted' AND intimation_date BETWEEN '$startDate' AND '$endDate';
Here, $userPolicyIds is a string like this ('123456','654321','456789'). This query is a bit slow, I'll admit that. But the number of policy ids is always going to a very big number and I wanted a quick fix.
just use prepare driver_options (PDO::prepare)
PDO::ATTR_EMULATE_PREPARES => true
https://learn.microsoft.com/en-us/sql/connect/php/pdo-prepare
and split where in on peaces (where (column in [...]) or column in [...])
Is it possible to capture the query text of the entire RPC SELECT statement from within the view which it calls? I am developing an enterprise database in SQL Server 2014 which must support a commercial application that submits a complex SELECT statement in which the outer WHERE clause is critical. This statement also includes subqueries against the same object (in my case, a view) which do NOT include that condition. This creates a huge problem because the calling application is joining those subqueries to the filtered result on another field and thus producing ID duplication that throws errors elsewhere.
The calling application assumes it is querying a table (not a view) and it can only be configured to use a simple WHERE clause. My task is to implement a more sophisticated security model behind this rather naive application. I can't re-write the offending query but I had hoped to retrieve the missing information from the cached query plan. Here's a super-simplified psuedo-view of my proposed solution:
CREATE VIEW schema.important_data AS
WITH a AS (SELECT special_function() AS condition),
b AS (SELECT c.criteria
FROM a, lookup_table AS c
WHERE a.condition IS NULL OR c.criteria = a.condition)
SELECT d.field_1, d.field_2, d.filter_field
FROM b, underlying_table AS d
WHERE d.filter_field = b.criteria;
My "special function" reads the query plan for the RPC, extracts the WHERE condition and preemptively filters the view so it always returns only what it should. Unfortunately, query plans don't seem to be cached until after they are executed. If the RPC is made several times, my solution works perfectly -- but only on the second and subsequent calls.
I am currently using dm_exec_cached_plans, dm_exec_query_stats and dm_exec_sql_text to obtain the full text of the RPC by searching on spid, plan creation time and specific words. It would be so much better if I could somehow get the currently executing plan. I believe dm_exec_requests does that but it only returns the current statement -- which is just the definition of my function.
Extended Events look promising but unfamiliar and there is a lot to digest. I haven't found any guidance, either, on whether they are appropriate for this particular challenge, whether a notification can be obtained in real time or how to ensure that the Event Session is always running. I am pursuing this investigation now and would appreciate any suggestions or advice. Is there anything else I can do?
This turns out to be an untenable idea and I have devised a less elegant work-around by restructuring the view itself. There is a performance penalty but the downstream error is avoided. My fundamental problem is the way the client application generates its SQL statements and there is nothing I do about that -- so, users will just have to accept whatever limitations may result.
I am creating a Java function that needs to use a SQL query with a lot of joins before doing a full scan of its result. Instead of hard-coding a lot of joins I decided to create a view with this complex query. Then the Java function just uses the following query to get this result:
SELECT * FROM VW_####
So the program is working fine but I want to make it faster since this SELECT command is taking a lot of time. After taking a look on its plan execution plan I created some indexes and made it +-30% faster but I want to make it faster.
The problem is that every operation in the execution plan have cost between 0% and 4% except one operation, a clustered-index insert that has +-50% of the execution cost. I think that the system is using a temporary table to store the view's data, but an index in this view isn't useful for me because I need all rows from it.
So what can I do to optimize that insert in the CWT_PrimaryKey? I think that I can't turn off that index because it seems to be part of the SQL Server's internals. I read somewhere that this operation could appear when you use cursors but I think that I am not using (or does the view use it?).
The command to create the view is something simple (no T-SQL, no OPTION, etc) like:
create view VW_#### as SELECTS AND JOINS HERE
And here is a picture of the problematic part from the execution plan: http://imgur.com/PO0ZnBU
EDIT: More details:
Well the query to create the problematic view is a big query that join a lot of tables. Based on a single parameter the Java-Client modifies the query string before creating it. This view represents a "data unit" from a legacy Database migrated to the SQLServer that didn't had any Foreign or Primary Key, so our team choose to follow this strategy. Because of that the view have more than 50 columns and it is made from the join of other seven views.
Main view's query (with a lot of Portuguese words): http://pastebin.com/Jh5vQxzA
The other views (from VW_Sintese1 until VW_Sintese7) are created like this one but without using extra views, they just use joins with the tables that contain the data requested by the main view.
Then the Java Client create a prepared Statement with the query "Select * from VW_Sintese####" and execute it using the function "ExecuteQuery", something like:
String query = "Select * from VW_Sintese####";
PreparedStatement ps = myConn.prepareStatement(query,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = ps.executeQuery();
And then the program goes on until the end.
Thanks for the attention.
First: you should post the code of the view along with whatever is using the views because of the rest of this answer.
Second: the definition of a view in SQL Server is later used to substitute in querying. In other words, you created a view, but since (I'm assuming) it isn't an indexed view, it is the same as writing the original, long SELECT statement. SQL Server kind of just swaps it out in the DML statement.
From Microsoft's 'Querying Microsoft SQL Server 2012': T-SQL supports the following table expressions: derived tables, common table expressions (CTEs), views, inline table-valued functions.
And a direct quote:
It’s important to note that, from a performance standpoint, when SQL Server optimizes
queries involving table expressions, it first unnests the table expression’s logic, and therefore interacts with the underlying tables directly. It does not somehow persist the table expression’s result in an internal work table and then interact with that work table. This means that table expressions don’t have a performance side to them—neither good nor
bad—just no side.
This is a long way of reinforcing the first statement: please include the SQL code in the view and what you're actually using as the SELECT statement. Otherwise, we can't help much :) Cheers!
Edit: Okay, so you've created a view (no performance gain there) that does 4-5 LEFT JOIN on to the main view (again, you're not helping yourself out much here by eliminating rows, etc.). If there are search arguments you can use to filter down the resultset to fewer rows, you should have those in here. And lastly, you're ordering all of this at the top, so your query engine will have to get those views, join them up to a massive SELECT statement, figure out the correct order, and (I'm guessing here) the result count is HUGE and SQL's db engine is ordering it in some kind of temporary table.
The short answer: get less data (fewer columns and only the rows you need); don't order the results if the resultset is very large, just get the data to the client and then sort it there.
Again, if you want more help, you'll need to post table schemas and index strategies for all tables that are in the query (including the views that are joined) and you'll need to include all view definitions (including the views that are joined).
We've got a weird problem with joining tables from SQL Server 2005 and MS Access 2003.
There's a big table on the server and a rather small table locally in Access. The tables are joined via 3 fields, one of them a datetime field (containing a day; idea is to fetch additional data (daily) from the big server table to add data to the local table).
Up until the weekend this ran fine every day. Since yesterday we experienced strange non-time-outs in Access with this query. Non-time-out means that the query runs forever with rather high network transfer, but no timeout occurs. Access doesn't even show the progress bar. Server trace tells us that the same query is exectuted over and over on the SQL server without error but without result either. We've narrowed it down to the problem seemingly being accessing server table with a big table and either JOIN or WHERE containing a date, but we're not really able to narrow it down. We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Thanks, Mike.
If you join a local table in Access to a linked table in SQL Server, and the query isn't really trivial according to specific limitations of joins to linked data, it's very likely that Access will pull the whole table from SQL Server and perform the join locally against the entire set. It's a known problem.
This doesn't directly address the question you ask, but how far are you from having all the data in one place (SQL Server)? IMHO you can expect the same type of performance problems to haunt you as long as you have some data in each system.
If it were all in SQL Server a pass-through query would optimize and use available indexes, etc.
Thanks for your quick answer!
The actual query is really huge; you won't be happy with it :)
However, we've narrowed it down to a simple:
SELECT * FROM server_table INNER JOIN access_table ON server_table.date = local_table.date;
If the server_table is a big table (hard to say, we've got 1.5 million rows in it; test tables with 10 rows or so have worked) and the local_table is a table with a single cell containing a date. This runs forever. It's not only slow, It just does nothing besides - it seems - causing network traffic and no time out (this is what I find so strange; normally you get a timeout, but this just keeps on running).
We've just found KB article 828169; seems to be our problem, we'll look into that. Thanks for your help!
Use the DATEDIFF function to compare the two dates as follows:
' DATEDIFF returns 0 if dates are identical based on datepart parameter, in this case d
WHERE DATEDIFF(d,Column,OtherColumn) = 0
DATEDIFF is optimized for use with dates. Comparing the result of the CONVERT function on both sides of the equal (=) sign might result in a table scan if either of the dates is NULL.
Hope this helps,
Bill
Try another syntax ? Something like:
SELECT * FROM BigServerTable b WHERE b.DateFld in (SELECT DISTINCT s.DateFld FROM SmallLocalTable s)
The strange thing in your problem description is "Up until the weekend this ran fine every day".
That would mean the problem is really somewhere else.
Did you try creating a new blank Access db and importing everything from the old one ?
Or just refreshing all your links ?
Please post the query that is doing this, just because you have indexes doesn't mean that they will be used. If your WHERE or JOIN clause is not sargable then the index will not be used
take this for example
WHERE CONVERT(varchar(49),Column,113) = CONVERT(varchar(49),OtherColumn,113)
that will not use an index
or this
WHERE YEAR(Column) = 2008
Functions on the left side of the operator (meaning on the column itself) will make the optimizer do an index scan instead of a seek because it doesn't know the outcome of that function
We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Access can kill many good things....have you looked into blocking at all
run
exec sp_who2
look at the BlkBy column and see who is blocking what
Just an idea, but in SQL Server you can attach your Access database and use the table there. You could then create a view on the server to do the join all in SQL Server. The solution proposed in the Knowledge Base article seems problematic to me, as it's a kludge (if LIKE works, then = ought to, also).
If my suggestion works, I'd say that it's a more robust solution in terms of maintainability.