HANA SQL CTE WHERE CONDITION - loops

I'm writing a scripted calculation view on HANA using SQL.
Looking for some performance booster alternatives for the logic that I have implemented in a while loop. Simplified version of code is below.
It is trying to get similar looking vendors in table B for vendors from table A.
Please bear with me for inaccurate syntax.
v = select vendor, vendorname from A;
while --set a counter here
vendorname = capture the record from v for row number represented by counter here
t = select vendor, vendorname from v where (read single vendor from counter row)
union all
select vendor, vendorname from B where contains(vendorname,:vendorname,fuzzy(0.3))
union all
select vendor, vendorname from t
endwhile
This query dies when there are thousands of records in both the tables. So after reading few blogs, I realized that I'm going in wrong direction that is using loop.
To make this little faster, I came across something called CTE.
When I tried to implement the same code using CTE I'm not allowed to do so.
Sample code I'm trying to write is below. Can anybody please help me get this right? The syntax is not accepted by system.
t = with mytab ("Vendor", "VendorName")
AS ( select "Vendor", "VendorName" from "A" WHERE ( "Updated_Date" >= :From_Date AND "Updated_Date" <= :To_Date ) )
select * from "B" WHERE CONTAINS ("VendorName", mytab."VendorName",FUZZY(0.3));
The SQL error for this syntax is:
SQL: invalid identifier: MYTAB
I would like to know:
Whether such operation with CTE is allowed. If yes, what is the correct syntax in HANA SQL?
If No, how do I achieve the desired result without looping through one table?
Thanks,
Anup

CTE's are allowed in SAP HANA - you might want to check the HANA SQL reference if you're looking for syntax.
But as you're in a SQLScript context anyhow, you might as well use table variables instead.
What I'm not sure about is, what you are actually trying to do. Provide a description of your usage scenario, if possible.
Ok, based on your comments, the following approach could work for you.
Note, in my example I use a copy of the USERS system table, so you will have to fit the query to your tables.
do
begin
declare user_names nvarchar(5000);
select string_agg(user_name,' ') into user_names
from cusers
where user_name in ('SYS', 'SYSTEM');
select *
from cusers
where contains (user_name, :user_names, fuzzy(0.3));
end;
What I do here is to get all the potential names for which I want to do a fuzzy lookup into a variable user_names (separated by a space). For this I use the STRING_AGG() aggregation function.
After the first statement is finished, :user_names contains SYSTEM SYS in my example.
Now, CONTAINS allows to search multiple columns for multiple search terms at once (you may want to re-check the reference documentation for details here), so
CONTAINS (<column_name>, 'term1 term2 term3')
looks for all three terms in the column .
With that we feed the string SYS SYSTEM into the second query and the CONTAINS clause.
That works fine for me, avoids a join and runs over the table to be searched only once.
BTW: no idea where you get that statement about table variables in read-only procedures from - it's wrong. Of course you can use table variables, in fact it's recommended to make use of them.

Related

Using an IN clause with table variable causes my query to run MUCH slower

I am using SSRS report whereby I need to pass multiple parameters to some SQL code.
Based on this blog post, the best way to handle multiple parameters is to used a split function, so that is the road I am following.
However, I am having some bad performance after following this.
For example, the following WHERE clause will return the data in 4 seconds:
AND DimBusinessDivision.Id IN (
22
)
This will also correctly return in 4 seconds:
DECLARE #BusinessDivisionId INT = 22
AND DimBusinessDivision.Id IN (
#BusinessDivisionId
)
However, using the split function such as below, It takes 2 minutes (which is the same time it takes without a WHERE clause:
AND DimBusinessDivision.Id IN (
SELECT Item FROM dbo.FuncSplit(#BusinessDivisionId, ',')
)
I've also tried creating a temp table and a table variable before the SQL statement with the results of the table but there's no difference. I have a feeling this has to do with the fact that the values are not literal values and that SQL server doesn't know what query plan to follow, or something similar. Does anyone know of any ways to increase the performance of this?
It simply doesn't like using a table to get the values in even if the table has the same amounts of rows.
UPDATE: I have used the table function as an inner join which has fixed the issue. Any idea's why this made all the difference?
INNER JOIN
dbo.FuncSplit(#BusinessDivisionIds, ',') AS FilteredBusinessDivisions ON
FilteredBusinessDivisions.Item = DimBusinessDivision.Id
A few things to play with:
Try the non-performant query and add OPTION (RECOMPILE); at the end of the query. If it magically runs much faster, then yes the issue was a bad cached query plan. For more information on this specific problem, you can Google "parameter sniffing" for a more thourough explanation.
You may also want to look at the function definition and toss a RECOMPILE in there too, and see what difference that makes.
Look at the estimated query plan and try to determine the difference.
But the root of the problem, I think, is that you are reinventing the wheel with this "split" function. You can have multi-valued parameters in SSRS and use "WHERE col IN #param": https://technet.microsoft.com/en-us/library/aa337396(v=sql.105).aspx
Unless there's a very specific reason you must split a comma separated list and cannot use normal parameters, just use a regular parameter that accepts multiple values.
Edit: I looked at the article you linked to. It's quite easy to have a SELECT ALL option in any reporting tool (not just SSRS), though it's not obvious. Using the "magic value" as written in the article you linked to works just fine. Can I ask what limitation is prompting you to need to do this string splitting?

SQL Server : Tables vs Cursors

I'm asking for a high level understanding of what these two things are.
From what I've read, it seems that in general, a query with an ORDER BY clause returns a cursor, and basically cursors have order to them whereas tables are literally a set where order is not guaranteed.
What I don't really understand is, why are these two things talked about like two separate animals. To me, it seems like cursors are a subset of tables. The book I'm reading vaguely mentioned that
"Some language elements and operations in SQL expect to work with
table results of queries and not with cursors; examples include table
expressions and set operators"
My question would be... why not? Why won't SQL handle it like a table anyways even if it's given an ordered set?
Just to clarify, I will type out the paragraph from the book:
A query with an ORDER BY clause results in what standard SQL calls a cursor - a nonrelational result with order guaranteed among rows. You're probably wondering why it matters whether a query returns a table result or a cursor. Some language elements and operations in SQL expect to work with table results of queries and not with cursors; examples include table expressions and set operators..."
A table is a result set. It has columns and rows. You can join to it with other tables to either filter or combine the data in ONE operation:
SELECT *
FROM TABLE1 T1
JOIN TABLE2 T2
ON T1.PK = T2.PK
A cursor is a variable that stores a result set. It has columns, but the rows are inaccessible - except the top one! You can't access the records directly, rather you must fetch them ONE ROW AT A TIME.
DECLARE TESTCURSOR CURSOR
FOR SELECT * FROM Table1
OPEN TESTCURSOR
FETCH NEXT FROM TESTCURSOR
You can also fetch them into variables, if needed, for more advanced processing.
Please let me know if that doesn't clarify it for you.
With regard to this sentence,
"Some language elements and operations in SQL expect to work with
table results of queries and not with cursors; examples include table
expressions and set operators"
I think the author is just saying that there are cases where it doesn't make sense to use an ORDER BY in a fragment of a query, because the ORDER BY should be on the outer query, where it will actually affect the final result of the query.
For instance, I can't think of any point in putting an ORDER BY on a CTE ("table expression") or on the Subquery in an IN( ) expression. UNLESS (in both cases) a TOP n was used as well.
When you create a VIEW, SQL Server will actually not allow you to use an ORDER BY unless a TOP n is also used. Otherwise the ORDER BY should be specified when Selecting from the VIEW, not in the code of the VIEW itself.

T-SQL Update Table using current columns as input parameters of function

I am trying to update table columns using a function. The input parameters of the function are data fields from the table that I want to update.
Let's say I have table with two columns ("Country" and "Capital"). The "Capital" is entered and I am using a function that returns a county name by capital name as input parameter. So, my update code is something like this:
UPDATE #TableName
SET Country=(SELECT Country FROM dbo.fn_GetCountryByCapital(Capital))
There is no error generated by IntelliSence,but on F5 press it say:
Incorrect syntax near 'Capital'.
Please, note that this is just a example (because it may looks to you silly). I give it sample in order to describe my problem. My real situation includes the use of several functions in the update statement.
Thank you in advance for the help.
Joro
Possible Solution:
I have found other way to do this. It does not look so good, but it works:
I have added index in my temp table in order to use while statement
For each record in the table (using while statement) I have used temp variables to store the field information I have need
Then I have passed this information to my functions and the outcome I have used to update the table
My guess is that the brackets '( )' that surrounded the select statement and the function do not allowed the function to use the correct values from the table.
learn the right way (most efficient) to build SQL:
UPDATE a
SET Country=b.Country
FROM #TableName a
INNER JOIN YourCountryCapitalTable b ON a.Capital=b.Capital
you can not code SQL like an application program, you need to use set logic and NOT per row logic. When you throw a bunch of functions into a SQL statement they most likely will need to be run per row, slowing down your queries (unless they are table functions in your FROM clause). If just incorporate the function into the query you can most likely see massive performance improvements because of index usage and operations occur on the complete and not row per row.
it is sad to have to very sql code that isn't elegant and often repeats itself all over the place. however, your main sql goal is fast data retrieval (index usage and set operations) and not some fancy coding beauty contest.
I have found other way to do this. yuck yuck yuck, sounds like a future question here on SO when the next person needs to maintain this code. You don't need an index to use a WHILE. If you have so many rows in your temp table that you need an index, a WHILE is the LAST thing you should be doing!

parsing comma seperated values in MS-SQL (no csv or such)

I use a closed source commercial application that uses an MS-SQL database. I regularly have to query this database myself for various purposes. This means the table and database design is fixed, and I can't do anything about it at all. I just have to live with it. Now I have two tables with the following layouts (abstracted, not to discredit the software/database designer)
t1: ID (int), att1(varchar), att2(varchar), .... attx(varchar)
t2: ID (int), t1_ids(varchar)
Now the contents of this t1_ids is (shudder) a comma separated list of t1 id's. (for example 12, 456, 43, 675, 54). What I want to do is (you guessed it) join those two tables.
Fortunately for me, these are very small tables, and I don't care about performance in terms of complexity at all (could be O(n^m) as far as I care).
Ideally I would like to make a view that joins these two tables. I don't have any requirements for inserting or updating, just for select statements. What would be the easiest and clearest (in terms of maintainability) way to do this?
To get the first and last too use this:
select *
from t1
join t2 on '%,' + t1.ID + ',%' like ',' + T2.t1_ids + ','
It doesn't matter if T2.t1_ids start or end with . The valid values are enclosed by commas.
EDIT: I realised after posting this answer the PARSENAME function can only return one part of the parsed string, so it's not a useful as I thought it would be in your situation.
Searching SO for alternative solutions I came across an interesting answer to this question: Split String in SQL
If you can add triggers to your database then you could call the split string on INSERT, UPDATE and DELETE to maintain another table with the id's separated as rows. Then you can create your view using that table.
I know you said you don't mind about the speed of the query but at least that way you are not parsing all the strings every time you query the data.

How do I filter one of the columns in a SQL Server SQL Query

I have a table (that relates to a number of other tables) where I would like to filter ONE of the columns (RequesterID) - that column will be a combobox where only people that are not sales people should be selectable.
Here is the "unfiltered" query, lets call it QUERY 1:
SELECT RequestsID, RequesterID, ProductsID
FROM dbo.Requests
If using a separate query, lets call it QUERY 2, to filter RequesterID (which is a People related column, connected to People.PeopleID), it would look like this:
SELECT People.PeopleID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
Now, is there a way of "merging" the QUERY 2 into QUERY 1?
(dbo.Requests in QUERY 1 has RequesterID populated as a Foreign Key from dbo.People, so no problem there... The connections are all right, just not know how to write the SQL query!)
UPDATE
Trying to explain what I mean in a bit more... :
The result set should be a number of REQUESTS - and the number of REQUESTS should not be limited by QUERY 2. QUERY 2:s only function is to limit the selectable subset in column Requests.RequesterID - and no, it´s not that clear, but in the C# VS2008 implementation I use Requests.RequesterID to eventually populate a ComboBox with [Full name], which is another column in the People table - and in that column I don´t want SalesGuy to show up as possible to select; here I´m trying to clear it out EVEN MORE... (but with wrong syntax, of course)
SELECT RequestsID, (RequesterID WHERE RequesterID != 8), ProductsID
FROM dbo.Requests
Yes, RequesterID 8 happens to be the SalesGuy :-)
here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
How about this? Since the query already joins on the requests table you can simply add the columns to the select-list like so :
SELECT Requests.RequestsID, Requests.RequesterID, Requests.ProductsID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
You can in fact select any column from any of the joined tables (Roles, Requests, People, etc.)
It becomes clear if you just replace People.PeopleId with * and it will show you everything retrieved from the tables.

Resources