Optimizing SQL Server Table Valued function with Full Text Search tables - sql-server

I have been trying to analyse and optimize a runtime spike in a stored procedure in SQL Server 2014.
Background of the DB objects
I found that the stored procedure shows a spike in runtime when there are more number of records in a source table to process.
The stored procedure part that is lagging is a SELECT INTO temp table (say #tempTable2) statement, that joins another temp table (say #tempTable1) and a multi-statement table valued function (say testFunction)
Code format below :
SELECT t.col1, f.col2
INTO #tempTable2
FROM #tempTable1 t
OUTER APPLY testFunction(t.id) f
This function returns results very quick when I tries to test passing a single value like below:
SELECT *
FROM testFunction(100)
But when the actual stored procedure statement runs, it is kind of stuck in the function for hours (e.g: if #tempTable1 has 12K records, taking 9 hours for the above statement to complete)
This function calls different other multi-statement table valued functions from it.
Also it queries from a table (say table1) which has full-text search enabled - uses full-text search functions such as CONTAINS and CONTAINSTABLE.
The table currently has 6 million records.
The full text index on the table has change tracking set to OFF &
Full text index is configured for 3 text columns in the table.
Also, The table has a clustered index on Id (int datatype) column.
This table is truncated and loaded daily and then below statement executes
ALTER FULLTEXT INDEX ON indx_table1 START FULL POPULATION
There is no other maintenance seems to be done for the Full text index table.
Checking sys.fulltext_index_fragments returns only 1 record with below details
status
data_size
row_count
4
465517531
1408231
max full-text crawl range for DB has value 4.
--
I doubt the bad performance of the query is due to the unmaintained Full text search function - but do not have any proof for it.
If anyone has some idea about this, can you please share your thoughts.
Edit:
Inserted script for TVF(testFunction) and function called from it(fnt_testfunction3) in
https://www.codepile.net/pile/6BDJNPA0
The table with full text search used in testFunction is tbl_FullTextSearch

Related

Creating PL/SQL procedure to fill intermediary table with random data

As part of my classes on relational databases, I have to create procedures as part of package to fill some of the tables of an Oracle database I created with random data, more specifically the tables community, community_account and community_login_info (see ERD linked below). I succeeded in doing this for tables community and community_account, however I'm having some problems with generating data for table community_login_info. This serves as an intermediary table between the many to many relationship of community and community_account, linking the id's of both tables.
My latest approach was to create an associative array with the structure of the target table community_login_info. I then do a cross join of community and community_account (there's already random data in there) along with random timestamps, bulk collect that result into the variable of the associative array and then insert those contents into the target table community_login_info. But it seems I'm doing something wrong since Oracle returns error ORA-00947 'not enough values'. To me it seems all columns the target table get a value in the insert, what am I missing here? I added the code from my package body below.
ERD snapshot
PROCEDURE mass_add_rij_koppeling_community_login_info
IS
TYPE type_rec_communties_accounts IS RECORD
(type_community_id community.community_id%type,
type_account_id community_account.account_id%type,
type_start_timestamp_login community_account.start_timestamp_login%type,
type_eind_timestamp_login community_account.eind_timestamp_login%type);
TYPE type_tab_communities_accounts
IS TABLE of type_rec_communties_accounts
INDEX BY pls_integer;
t_communities_accounts type_tab_communities_accounts;
BEGIN
SELECT community_id,account_id,to_timestamp(start_datum_account) as start_timestamp_login, to_timestamp(eind_datum_account) as eind_timestamp_login
BULK COLLECT INTO t_communities_accounts
FROM community
CROSS JOIN community_account
FETCH FIRST 50 ROWS ONLY;
FORALL i_index IN t_communities_accounts.first .. t_communities_accounts.last
SAVE EXCEPTIONS
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (t_communities_accounts(i_index));
END mass_add_rij_koppeling_community_login_info;
Your error refers to the part:
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (t_communities_accounts(i_index));
(By the way, the complete error message gives you the line number where the error is located, it can help to focus the problem)
When you specify the columns to insert, then you need to specify the columns in the VALUES part too:
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
VALUES (t_communities_accounts(i_index).community_id,
t_communities_accounts(i_index).account_id,
t_communities_accounts(i_index).start_timestamp_login,
t_communities_accounts(i_index).eind_timestamp_login);
If the table COMMUNITY_LOGIN_INFO doesn't have any more columns, you could use this syntax:
INSERT INTO community_login_info
VALUE (t_communities_accounts(i_index));
But I don't like performing inserts without specifying the columns because I could end up inserting the start time into the end time and vice versa if I haven't defined the columns in exactly the same order as the table definition, and if the definition of the table changes over time and new columns are added, you have to modify your procedure to add the new column even if the new column goes with a NULL value because you don't fill up that new column with this procedure.
PROCEDURE mass_add_rij_koppeling_community_login_info
IS
TYPE type_rec_communties_accounts IS RECORD
(type_community_id community.community_id%type,
type_account_id community_account.account_id%type,
type_start_timestamp_login community_account.start_timestamp_login%type,
type_eind_timestamp_login community_account.eind_timestamp_login%type);
TYPE type_tab_communities_accounts
IS TABLE of type_rec_communties_accounts
INDEX BY pls_integer;
t_communities_accounts type_tab_communities_accounts;
BEGIN
SELECT community_id,account_id,to_timestamp(start_datum_account) as start_timestamp_login, to_timestamp(eind_datum_account) as eind_timestamp_login
BULK COLLECT INTO t_communities_accounts
FROM community
CROSS JOIN community_account
FETCH FIRST 50 ROWS ONLY;
FORALL i_index IN t_communities_accounts.first .. t_communities_accounts.last
SAVE EXCEPTIONS
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (select community_id,account_id,start_timestamp_login,eind_timestamp_login
from table(cast(t_communities_accountsas type_tab_communities_accounts)) a);
END mass_add_rij_koppeling_community_login_info;

SQL Server variable Table or index causing performance issues

I am trying to build a stored procedure that retrieve information from few tables in my databases. I often use variable table to hold data since I have to return it in a result set and also reuse it in following queries instead of requiring the table multiple times.
Is this a good and common way to do that ?
So I started having performance issues when testing the stored procedure. By the way is there an efficient way to test is without having to change the parameter each times ? If I don't change parameter values the query will take only a few milliseconds to run I assume it use some sort of cache.
So I was starting having performance issues when the day before everything was working well so I reworked my queries looked that all index was being used correctly etc. Then I tried switching variable table for temp table just for testing purpose and bingo the 2 or 3 next tests ran like a charm and then performance issues started to appear again. So I am a bit clueless on what happens here and why it happen.
I am running my tests on the production db since it doesn't update or insert anything. There is a piece of code to give you an idea of my test case
--Stuff going on to get values in a temps table for the next query
DECLARE #ApplicationIDs TABLE(ID INT)
-- This table have over 110 000 000 rows and this query use one of its indexes. The query insert between 1 and 10-20k rows
INSERT INTO #ApplicationIDs(ID)
SELECT ApplicationID
FROM Schema.Application
WHERE Columna = value
AND Columnb = value
AND Columnc = value
-- I request the table again but joined with other tables to have my final resultset no performance issues here. ApplicationID is the clustered primary key
SELECT Columns
FROM Schema.Application
INNER JOIN SomeTable ON Columna = Columnb
WHERE ApplicationID IN (SELECT ID FROM #ApplicationIDs)
--There is where it starts happening this table has around 200 000 000 rows and about 50 columns and yes the applicationid column is indexed (nonclustered). I use this index that way in few other context and it work well just not this one
SELECT Columns
FROM Schema.SubApplication
WHERE ApplicationID IN (SELECT ID FROM #ApplicationIDs)
The server is in a VM with 64 gb of ram and SQL have 56GB allocated.
Let me know if you need further details.

Get a list of columns and widths for a specific record

I want a list of properties about a given table and for a specific record of data from that table - in one result
Something like this:
Column Name , DataLength, SchemaLengthMax
...and for only one record (based on a where filter)
So what Im thinking is something like this:
- Get a list of columns from sys.columns and also the schema-based maxlength value
- populate column names into a temp table that includes (column_name, data_length, schema_size_max)
- now loop over that temp table and for each column name, fetch the data for that column based on a specific record, then update the temp table with the length of this data
- finally, select from the temp table
sound reasonable?
Yup. That way works. Not sure if it's the best, since it involves one iteration per column along with the where condition on the source table.
Consider this, instead :
Get the candidate records into a temporary table after applying the where condition. Make sure to get a primary key. If there is no primary key, get a rowid. (assuming SQL Server 2005 or above).
Create a temporary table (Say, #RecValueLens) that has three columns : Primary_key_Value, MyColumnName, MyValueLen
Loop through the list of column names (after taking only the column names into another temporary table) and build sql statement shown in Step 4.
Insert Into #RecValueLens (Primary_Key_Value, MyColumnName, MyValueLen)
Select Max(Primary_Key_Goes_Here), Max('Column_Name_Goes_Here') as ColumnName, Len(Max(Column_Name)) as ValueMyLen From Source_Table_Goes_Here
Group By Primary_Key_Goes_Here
So, if there are 10 columns, you will have 10 insert statements. You could either insert them into a temporary table and run it as a loop. If the number of columns is few, you could concatenate all statements into a single batch.
Run the SQL Statement(s) from above. So, you have Record-wise, column-wise, Value lengths. What is left is to get the column definition.
Get the column definition from sys.columns into a temporary table and join with the #RecValueLens to get the output.
Do you want me to write it for you ?

Getting bulk data into a busy table

I am currently performing analysis on a client's MSSQL Server. I've already fixed many issues (unnecessary indexes, index fragmentation, NEWID() being used for identities all over the shop etc), but I've come across a specific situation that I haven't seen before.
Process 1 imports data into a staging table, then Process 2 copies the data from the staging table using an INSERT INTO. The first process is very quick (it uses BULK INSERT), but the second takes around 30 mins to execute. The "problem" SQL in Process 2 is as follows:
INSERT INTO ProductionTable(field1,field2)
SELECT field1, field2
FROM SourceHeapTable (nolock)
The above INSERT statement inserts hundreds of thousands of records into ProductionTable, each row allocating a UNIQUEIDENTIFIER, and inserting into about 5 different indexes. I appreciate this process is going to take a long time, so my issue is this: while this import is taking place, a 3rd process is responsible for performing constant lookups on ProductionTable - in addition to inserting an additional record into the table as such:
INSERT INTO ProductionTable(fields...)
VALUES(values...)
SELECT *
FROM ProductionTable (nolock)
WHERE ID = #Id
For the 30 or so minutes that the INSERT...SELECT above is taking place, the INSERT INTO times-out.
My immediate thought is that SQL server is locking the entire table during the INSERT...SELECT. I did quite a lot of profiling on the server during my analysis, and there are definitely locks being allocated for the duration of the INSERT...SELECT, though I fail remember what type they were.
Having never needed to insert records into a table from two sources at the same time - at least during an ETL process - I'm not sure how to approach this. I've been looking up INSERT table hints, but most are being made obsolete in future versions.
It looks to me like a CURSOR is the only way to go here?
You could consider BULK INSERT for Process-2 to get the data into the ProductionTable.
Another option would be to batch Process-2 into small batches of around 1000 records and use a Table Valued Parameter to do the INSERT. See: http://msdn.microsoft.com/en-us/library/bb510489.aspx#BulkInsert
It seems like table lock.
Try portion insert in ETL process. Something like
while 1=1
begin
INSERT INTO ProductionTable(field1,field2)
SELECT top (1000) field1, field2
FROM SourceHeapTable sht (nolock)
where not exists (select 1 from ProductionTable pt where pt.id = sht.id)
-- optional
--waitfor delay '00:00:01.0'
if ##rowcount = 0
break;
end

How to calculate the hash of a row in SQL Anywhere 11 database table?

My application is continuously polling the database. For optimization purpose, I want the application to query the database only if the tables have been modified. So I want to calculate the HASH of entire table and compare it with the last-saved-hash of table. (I plan to compute the hash by first calculating HASH of each row and then followed by their hash i.e. HASH of HASHes)
I found that there is Checksum() sql utility function for SQL Server which computes HASH/Checksum for one row.
Is there any similar utility/query to find the HASH of a row in SQL Anywhere 11 database?
FYI, the database table does not have any coloumn with the precomputed HASH/Checksum.
Got the answer. We can compute the hash on a particular column of a table using below query:
-- SELECT HASH(coulum_name, hash_algorithm)
-- For example:
SELECT HASH(coulmn, 'md5')
FROM MyTable
This creates a hash over all data of a table, to detect changes in any column:
CREATE VARIABLE #tabledata LONG VARCHAR;
UNLOAD TABLE MyTable INTO VARIABLE #tabledata ORDER ON QUOTES OFF COMPRESSED;
SET #tabledata = Hash(#tabledata);
IF (#tabledata <> '40407ede9683bcfb46bc25151139f62c') THEN
SELECT #tabledata AS hash;
SELECT * FROM MyTable;
ENDIF;
DROP VARIABLE #tabledata;
Of course this is expensive and shouldn't be used if the data is hundreds of megabytes. But if the only other way is comparing all the data for any changes, this will be faster and produces load and memory consumption only on the db server.
If the change detection is only needed for a few columns, you can use UNLOAD SELECT col FROM table INTO ... instead.

Resources