Sql code into kdb code - sql-server

I am taking a query in SQL Server and trying to write it in KDB. I have done this before without issue but this time the kdb query seems to be returning a different amount of rows than SQL when it should return the same. My SQL code is:
SELECT
*
FROM tbl_Master
INNER JOIN map_2012
ON tbl_Master.RM_ID = map_2012.RMID
LEFT JOIN src_CQ
ON map_2012.INST = src_CQ.INST
My kdb code is
Y:map_2012 ij `RMID xkey select RMID:RM_ID, RG_ID, RM_Name from
tbl__Master
Y:src_CQ lj `INST xkey Y
It is simple code but I can not figure out why the returned rows are so much different

In SQL query you left join src_CQ to map_2012. To do the same in KDB map_2012 has to be on the left and src_CQ on the right:
Y:Y lj `INST xkey src_CQ
Also note, that in case map_2012 has duplicate values in RMID column or src_CQ has duplicate values in INST column, keying of table in KDB will remove rows with the duplicates. Under these circumstances another approach is required, I believe equi-join eq will help in this case.

Related

Is there an equivalent to OR clause in CONTAINSTABLE - FULL TEXT INDEX

I am trying to find a solution in order to improve the String searching process and I selected FULL-TEXT INDEX Strategy.
However, after implementing it, I still can see there is a performance hit when it comes to search by using multiple strings using multiple Full-Text Index tables with OR clauses.
(E.x. WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%'))
As a solution, I am trying to use CONTAINSTABLE expecting a performance improvement.
Now, I am facing an issue with CONTAINSTABLE when it comes to joining tables with a LEFT JOIN
Please go through the example below.
Query 1
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
LEFT JOIN CONTAINSTABLE(P.Building,*,'%John%') AS FFTIndex ON F.ID = FFTIndex.[Key]
LEFT JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
LEFT JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
LEFT JOIN P.Person p ON pr2.ID = p.PID
LEFT JOIN CONTAINSTABLE(P.Person,FirstName,'%John%') AS PFTIndex ON P.ID = PFTIndex.[Key]
WHERE F.Name IS NOT NULL
This produces the below result.
Query 2
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
INNER JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
INNER JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
INNER JOIN P.Person p ON pr2.ID = p.PID
WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
AND F.Name IS NOT NULL
Result
Expectation
To use query 1 in a way that works as the behavior of an SQL SERVER OR clause. As I can understand Query 1's CONTAINSTABLE, joins the data with the building table, and the rest of the results are going to ignore so that the CONTAINSTABLE of the Person table gets data that already contains the keyword filtered from the building table.
If the keyword = Building, I want to match the keyword in both the tables regardless of searching a saved record in both the tables. Having a record in each table is enough.
Summary
Query 2 performs well but is creates a slowness when the words in the indexes are growing. Query 1 seems optimized(When it comes to multiple online resources and MS Documentation),
however, it does not give me the expected output.
Is there any way to solve this problem?
I am not strictly attached to CONTAINSTABLE. Suggesting another optimization method will also be considerable.
Thank you.
Hard to say definitively without your full data set but a couple of options to explore
Remove Invalid % Wildcards
Why are you using '%SearchTerm%'? Does performance improve if you use the search term without the wildcards (%)? If you want a word that matches a prefix, try something like
WHERE CONTAINS (String,'"SearchTerm*"')
Try Temp Tables
My guess is CONTAINS is slightly faster than CONTAINSTABLE as it doesn't calculate a rank, but I don't know if anyone has ever attempted to benchmark it. Either way, I'd try saving off the matches to a temp table before joining up to the rest of the tables. This will allow the optimizer to create a better execution plan
SELECT ID INTO #Temp
FROM YourTable
WHERE CONTAINS (String,'"SearchTerm"')
SELECT *
FROM #Temp
INNER JOIN...
Optimize Full Text Index by Removing Noisy Words
You might find you have some noisy words aka words that reoccur many times in your data that are meaningless like "the" or perhaps some business jargon. Adding these to your stop list will mean your full text index will ignore them, making your index smaller thus faster
The query below will list indexed words with the most frequent at the top
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.YourTable') /*Replace with your table name*/)
Order By document_count Desc
This OR That Criteria
For your WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%') criteria where you want this or that, is tricky. OR clauses generally perform even when using simple equality operators.
I'd try either doing two queries and union the results like:
SELECT * FROM Table1 F
/*Other joins and stuff*/
WHERE CONTAINS(F.*,'%Gayan%')
UNION
SELECT * FROM Table2 P
/*Other joins and stuff*/
WHERE CONTAINS(P.FirstName,'%John%')
OR this is much more work, but you could load all your data into giant denormalized table with all your columns. Then apply a full text index to that table and adjust your search criteria that way. It'd probably be the fastest method searching, but then you'd have to ensure the data is sync between the denormalized table and the underlying normalized tables
SELECT B.*,P.* INTO DenormalizedTable
FROM Building AS B
INNER JOIN People AS P
CREATE FULL TEXT INDEX ft ON DenormalizedTable
etc...

PostgreSQL: Using AND statement in LEFT JOIN is not working as expected

This query returns all the elements in the table la and all nulls for fields coming from the lar table which is not what I expected.
SELECT
la.listing_id,
la.id,
lar.*
FROM la
LEFT JOIN lar
ON lar.application_id = la.id AND la.listing_id = 2780;
This query returns correct and expected results but shouldn't both queries do the same thing ?
SELECT
la.listing_id,
la.id,
lar.*
FROM la
LEFT JOIN lar
ON lar.application_id = la.id
WHERE la.listing_id = 2780;
What am I missing here?
I want to make conditional joins as I have noticed that for complex queries Postgresql does the join then do the WHERE clause which is actually very slow. How to make the database filter out some records before doing the JOIN ?
The confusion around LEFT JOIN and WHERE clause has been clarified many times:
SQL / PostgreSQL left join ignores "on = constant" predicate, on left table
This interesting question remains:
How to make the database filter out some records before doing the JOIN?
There are no explicit query hints in Postgres. (Which is a matter of ongoing debate.) But there are still various tricks to make Postgres bend your way.
But first, ask yourself: Why did the query planner estimate the chosen plan to be cheaper to begin with? Is your server configuration basically sane? Cost settings adequate? autovacuum running? Postgres version outdated? Are you working around an underlying problem that should really be fixed?
If you force Postgres to do it your way, you should be sure it won't fire back, after a version upgrade or update to the server configuration ... You'd better know what you are doing exactly.
That said, you can force Postgres to "filter out some records before doing the JOIN" with a subquery where you add OFFSET 0 - which is just noise, logically, but prevents Postgres from rearranging it into the form of a regular join. (Query hint after all)
SELECT la.listing_id, la.id, lar.*
FROM (
SELECT listing_id, id
FROM la
WHERE listing_id = 2780
OFFSET 0
) la
LEFT JOIN lar ON lar.application_id = la.id;
Or you can use a CTE (less obscure, but more expensive). Or other tricks like setting certain config parameters. Or, in this particular case, I would use a LATERAL join to the same effect:
SELECT la.listing_id, la.id, lar.*
FROM la
LEFT JOIN LATERAL (
SELECT *
FROM lar
WHERE application_id = la.id
) lar ON true
WHERE la.listing_id = 2780;
Related:
Sample Query to show Cardinality estimation error in PostgreSQL
Here is an extensive blog on Query hints by 2ndQuadrant. Five year old but still valid.
The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match.
So no matter you try to filter with AND la.listing_id = 2780; you still get all the rows from first table. But only those with la.listing_id = 2780; will have something <> NULL on the right side
The behaviour is different if you try INNER JOIN in that case only the matching columns are created and the AND condition will filter the rows.
So to make the first query work you need add WHERE la.listing_id IS NOT NULL
The problem with second query is will try to JOIN every row and then will filter only the one you need.

Sql Query Error "ORA-01403: no data found" [duplicate]

I have a linked server setup in SQL Server to hit an Oracle database. I have a query in SQL Server that joins on the Oracle table using dot notation. I am getting a “No Data Found” error from Oracle. On the Oracle side, I am hitting a table (not a view) and no stored procedure is involved.
First, when there is no data I should just get zero rows and not an error.
Second, there should actually be data in this case.
Third, I have only seen the ORA-01403 error in PL/SQL code; never in SQL.
This is the full error message:
OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE" returned message "ORA-01403: no data found".
Msg 7346, Level 16, State 2, Line 1
Cannot get the data of the row from the OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE".
Here are some more details, but it probably does not mean anything since you don’t have my tables and data.
This is the query with the problem:
select *
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I can’t understand why I get a “no data found” error. The query below uses the same Oracle table and returns no data but I don’t get an error - I just get no rows returned.
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID = -1
The query below returns data. I just removed one of the SQL Server tables from the join. But removing the batch table does not change the rows returned from batchFile (271 rows in both cases – all rows in batchFile have a batch entry). It should still be joining the same batchFile rows to the same Oracle rows.
select *
from eopf.BatchFile bf
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
And this query returns 5 rows. It should be the same 5 from the original query. ( I can’t use this because I need data from the batch and batchFile table).
select *
from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
where du.documentUploadId
in
(
select bf.ReferenceID
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID);
Has anyone experienced this error?
Today I experienced the same problem with an inner Join. As creating a Table Valued Function suggested by codechurn or using a Temporary Table suggested by user1935511 or changing the Join Types suggested by cymorg are no options for me, I like to share my solution.
I used Join Hints to drive the query optimizer into the right direction, as the problem seems to rise up from nested loops join strategy with the remote table locally . For me HASH, MERGE and REMOTE join hints worked.
For you REMOTE will not be an option because it can be used only for inner join operations. So using something like the following should work.
select *
from eopf.Batch b
join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer merge join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I've had the same problem.
Solution1: load the data from the Oracle database into a temp table, then join to that temp table instead - here's a link.
From this post a link you can find out that the problem can be with using left join.
I've checked with my problem and after changing my query it solved the problem.
In my case I had a complex view made from a linked table, 3 views based on the linked table and a local table. I was using Inner Joins throughout and this problem manifested. Changing the joins to Left and Right Outer Joins (as appropriate) resolved the issue.
Another way to work around the problem is to pull back the Oracle data into a Table Valued Function. This will cause SQL Server to go out and retrieve all of the data from Oracle and throw it into a resultant table variable. For all intent and purpose, the Oracle data is now "local" to SQL Server if you use the resultant Table Valued Function in a query.
I believe the original problem is that SQL Server is trying to optimize the execution of your compound query which includes the remote Oracle query results in-line. By using a Table Valued Function to wrap the Oracle call, SQL Server will optimize the compound query on the resultant table variable returned from the function and not the results from the remote query execution.
CREATE function [dbo].[documents]()
returns #results TABLE (
DOCUMENT_ID INT NOT NULL,
TITLE VARCHAR(6) NOT NULL,
LEGALNAME VARCHAR(50) NOT NULL,
AUTHOR_ID INT NOT NULL,
DOCUMENT_TYPE VARCHAR(1) NOT NULL,
LAST_UPDATE DATETIME
) AS
BEGIN
INSERT INTO #results
SELECT CAST(DOCUMENT_ID AS INT) AS DOCUMENT_ID, TITLE, LEGALNAME, CAST(AUTHOR_ID AS INT) AS AUTHOR_ID, DOCUMENT_TYPE, LAST_UPDATE
FROM OPENQUERY(ORACLE_SERVER,
'select DOCUMENT_ID, TITLE, LEGALNAME, AUTHOR_ID, DOCUMENT_TYPE, FUNDTYPE, LAST_UPDATE
from documents')
return
END
You can then use the Table Valued Function as it it were a table in your SQL queries:
SELECT * FROM DOCUMENTS()
I resolved it by avoiding the = operator. Try using this instead:
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID < 0

sql server linked server to oracle returns no data found when data exists

I have a linked server setup in SQL Server to hit an Oracle database. I have a query in SQL Server that joins on the Oracle table using dot notation. I am getting a “No Data Found” error from Oracle. On the Oracle side, I am hitting a table (not a view) and no stored procedure is involved.
First, when there is no data I should just get zero rows and not an error.
Second, there should actually be data in this case.
Third, I have only seen the ORA-01403 error in PL/SQL code; never in SQL.
This is the full error message:
OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE" returned message "ORA-01403: no data found".
Msg 7346, Level 16, State 2, Line 1
Cannot get the data of the row from the OLE DB provider "OraOLEDB.Oracle" for linked server "OM_ORACLE".
Here are some more details, but it probably does not mean anything since you don’t have my tables and data.
This is the query with the problem:
select *
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I can’t understand why I get a “no data found” error. The query below uses the same Oracle table and returns no data but I don’t get an error - I just get no rows returned.
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID = -1
The query below returns data. I just removed one of the SQL Server tables from the join. But removing the batch table does not change the rows returned from batchFile (271 rows in both cases – all rows in batchFile have a batch entry). It should still be joining the same batchFile rows to the same Oracle rows.
select *
from eopf.BatchFile bf
left outer join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
And this query returns 5 rows. It should be the same 5 from the original query. ( I can’t use this because I need data from the batch and batchFile table).
select *
from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
where du.documentUploadId
in
(
select bf.ReferenceID
from eopf.Batch b join eopf.BatchFile bf
on b.BatchID = bf.BatchID);
Has anyone experienced this error?
Today I experienced the same problem with an inner Join. As creating a Table Valued Function suggested by codechurn or using a Temporary Table suggested by user1935511 or changing the Join Types suggested by cymorg are no options for me, I like to share my solution.
I used Join Hints to drive the query optimizer into the right direction, as the problem seems to rise up from nested loops join strategy with the remote table locally . For me HASH, MERGE and REMOTE join hints worked.
For you REMOTE will not be an option because it can be used only for inner join operations. So using something like the following should work.
select *
from eopf.Batch b
join eopf.BatchFile bf
on b.BatchID = bf.BatchID
left outer merge join [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] du
on bf.ReferenceID = du.documentUploadID;
I've had the same problem.
Solution1: load the data from the Oracle database into a temp table, then join to that temp table instead - here's a link.
From this post a link you can find out that the problem can be with using left join.
I've checked with my problem and after changing my query it solved the problem.
In my case I had a complex view made from a linked table, 3 views based on the linked table and a local table. I was using Inner Joins throughout and this problem manifested. Changing the joins to Left and Right Outer Joins (as appropriate) resolved the issue.
Another way to work around the problem is to pull back the Oracle data into a Table Valued Function. This will cause SQL Server to go out and retrieve all of the data from Oracle and throw it into a resultant table variable. For all intent and purpose, the Oracle data is now "local" to SQL Server if you use the resultant Table Valued Function in a query.
I believe the original problem is that SQL Server is trying to optimize the execution of your compound query which includes the remote Oracle query results in-line. By using a Table Valued Function to wrap the Oracle call, SQL Server will optimize the compound query on the resultant table variable returned from the function and not the results from the remote query execution.
CREATE function [dbo].[documents]()
returns #results TABLE (
DOCUMENT_ID INT NOT NULL,
TITLE VARCHAR(6) NOT NULL,
LEGALNAME VARCHAR(50) NOT NULL,
AUTHOR_ID INT NOT NULL,
DOCUMENT_TYPE VARCHAR(1) NOT NULL,
LAST_UPDATE DATETIME
) AS
BEGIN
INSERT INTO #results
SELECT CAST(DOCUMENT_ID AS INT) AS DOCUMENT_ID, TITLE, LEGALNAME, CAST(AUTHOR_ID AS INT) AS AUTHOR_ID, DOCUMENT_TYPE, LAST_UPDATE
FROM OPENQUERY(ORACLE_SERVER,
'select DOCUMENT_ID, TITLE, LEGALNAME, AUTHOR_ID, DOCUMENT_TYPE, FUNDTYPE, LAST_UPDATE
from documents')
return
END
You can then use the Table Valued Function as it it were a table in your SQL queries:
SELECT * FROM DOCUMENTS()
I resolved it by avoiding the = operator. Try using this instead:
select * from [OM_ORACLE]..[OM].[DOCUMENT_UPLOAD] where documentUploadID < 0

How can I speed up this SQL view?

I'm a beginner at this so hope you can help. I'm working in SQL server 2008R2 and have a view that is comprised from four tables all joined together:
SELECT DISTINCT ad.award_id,
bl.funding_id,
bl.budget_line,
dd4.monthnumberofyear AS month,
dd4.yearcalendar AS year,
CASE
WHEN frb.full_value IS NULL THEN '0'
ELSE frb.full_value
END AS Expenditure_value,
bl.budget_id,
frb.accode,
'Actual' AS Type
FROM dw.dbo.dimdate5 AS dd4
LEFT OUTER JOIN dbo.award_data AS ad
ON dd4.fulldate BETWEEN ad.usethisstartdate AND
ad.usethisenddate
LEFT OUTER JOIN dbo.budget_line AS bl
ON bl.award_id = ad.award_id
LEFT OUTER JOIN dw.dbo.fctresearchbalances AS frb
ON frb.el3 = bl.award_id
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The view has 9 columns and 1.5 million rows and growing. A select * from this view was taking 20 minutes for all the rows. I added indexes on the fields in the tables that are joined on and that improved it to 10 minutes. My question is what else could I do to get the select to run faster?
Many thanks, Violet.
Try getting rid of the case statement.
If you have 1.5 million rows, if you're interesting in the aggregation of those rows rather than the whole set, you might want to sum the rows in fctResearchBalances first and then do the joins.
(It's a bit difficult to determine what else you might benefit from, without seeing the access plan.)
1- You can use stored procedure to have buffer cache.
2- you can use indexed view , this means creating index on schemabound views.
3- you can use query hints in join to order the query optimizer to use special kind of join.
4- you can use table partitioning .
SELECT DISTINCT --#1 - potential bottleneck
ad.award_id
, bl.funding_id
, bl.budget_line
, [month] = dd4.monthnumberofyear
, [year] = dd4.yearcalendar
, Expenditure_value = ISNULL(frb.full_value, '0')
, bl.budget_id
, frb.accode
, [type] = 'Actual'
FROM dbo.dimdate5 dd4
LEFT JOIN dbo.award_data ad ON dd4.fulldate BETWEEN ad.usethisstartdate AND ad.usethisenddate
LEFT JOIN dbo.budget_line bl ON bl.award_id = ad.award_id
LEFT JOIN dbo.fctresearchbalances frb ON frb.el3 = bl.award_id --#2 - join by multiple columns
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The CASE statement can be replace by
COALESCE(frb.full_value,'0') AS Expenditure_value
Without more info it's not possible to tell exactly what is wrong but just to give you some pointers.
When you have so many LEFT JOINS the order of the joins can make a difference.
Do you have standard indexes or covering indexes with included columns?
If you don't have covering indexes, then primary keys matter in the joins. Including all the primary key columns in the join condition will speed up the query.
Then look at your data - do you need all those LEFT JOINS base on the foreign keys between those tables? Depending on your keys a LEFT JOIN may be equivalent to an INNER JOIN.
And with all those LEFT JOINS is having a DISTINCT really useful?
How much RAM do you have? If you have 8GB+ then 1.5m rows is nothing for SQL Server. You need to optimise those joins.

Resources