How to calculate the hash of a row in SQL Anywhere 11 database table? - database

My application is continuously polling the database. For optimization purpose, I want the application to query the database only if the tables have been modified. So I want to calculate the HASH of entire table and compare it with the last-saved-hash of table. (I plan to compute the hash by first calculating HASH of each row and then followed by their hash i.e. HASH of HASHes)
I found that there is Checksum() sql utility function for SQL Server which computes HASH/Checksum for one row.
Is there any similar utility/query to find the HASH of a row in SQL Anywhere 11 database?
FYI, the database table does not have any coloumn with the precomputed HASH/Checksum.

Got the answer. We can compute the hash on a particular column of a table using below query:
-- SELECT HASH(coulum_name, hash_algorithm)
-- For example:
SELECT HASH(coulmn, 'md5')
FROM MyTable

This creates a hash over all data of a table, to detect changes in any column:
CREATE VARIABLE #tabledata LONG VARCHAR;
UNLOAD TABLE MyTable INTO VARIABLE #tabledata ORDER ON QUOTES OFF COMPRESSED;
SET #tabledata = Hash(#tabledata);
IF (#tabledata <> '40407ede9683bcfb46bc25151139f62c') THEN
SELECT #tabledata AS hash;
SELECT * FROM MyTable;
ENDIF;
DROP VARIABLE #tabledata;
Of course this is expensive and shouldn't be used if the data is hundreds of megabytes. But if the only other way is comparing all the data for any changes, this will be faster and produces load and memory consumption only on the db server.
If the change detection is only needed for a few columns, you can use UNLOAD SELECT col FROM table INTO ... instead.

Related

Creating PL/SQL procedure to fill intermediary table with random data

As part of my classes on relational databases, I have to create procedures as part of package to fill some of the tables of an Oracle database I created with random data, more specifically the tables community, community_account and community_login_info (see ERD linked below). I succeeded in doing this for tables community and community_account, however I'm having some problems with generating data for table community_login_info. This serves as an intermediary table between the many to many relationship of community and community_account, linking the id's of both tables.
My latest approach was to create an associative array with the structure of the target table community_login_info. I then do a cross join of community and community_account (there's already random data in there) along with random timestamps, bulk collect that result into the variable of the associative array and then insert those contents into the target table community_login_info. But it seems I'm doing something wrong since Oracle returns error ORA-00947 'not enough values'. To me it seems all columns the target table get a value in the insert, what am I missing here? I added the code from my package body below.
ERD snapshot
PROCEDURE mass_add_rij_koppeling_community_login_info
IS
TYPE type_rec_communties_accounts IS RECORD
(type_community_id community.community_id%type,
type_account_id community_account.account_id%type,
type_start_timestamp_login community_account.start_timestamp_login%type,
type_eind_timestamp_login community_account.eind_timestamp_login%type);
TYPE type_tab_communities_accounts
IS TABLE of type_rec_communties_accounts
INDEX BY pls_integer;
t_communities_accounts type_tab_communities_accounts;
BEGIN
SELECT community_id,account_id,to_timestamp(start_datum_account) as start_timestamp_login, to_timestamp(eind_datum_account) as eind_timestamp_login
BULK COLLECT INTO t_communities_accounts
FROM community
CROSS JOIN community_account
FETCH FIRST 50 ROWS ONLY;
FORALL i_index IN t_communities_accounts.first .. t_communities_accounts.last
SAVE EXCEPTIONS
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (t_communities_accounts(i_index));
END mass_add_rij_koppeling_community_login_info;
Your error refers to the part:
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (t_communities_accounts(i_index));
(By the way, the complete error message gives you the line number where the error is located, it can help to focus the problem)
When you specify the columns to insert, then you need to specify the columns in the VALUES part too:
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
VALUES (t_communities_accounts(i_index).community_id,
t_communities_accounts(i_index).account_id,
t_communities_accounts(i_index).start_timestamp_login,
t_communities_accounts(i_index).eind_timestamp_login);
If the table COMMUNITY_LOGIN_INFO doesn't have any more columns, you could use this syntax:
INSERT INTO community_login_info
VALUE (t_communities_accounts(i_index));
But I don't like performing inserts without specifying the columns because I could end up inserting the start time into the end time and vice versa if I haven't defined the columns in exactly the same order as the table definition, and if the definition of the table changes over time and new columns are added, you have to modify your procedure to add the new column even if the new column goes with a NULL value because you don't fill up that new column with this procedure.
PROCEDURE mass_add_rij_koppeling_community_login_info
IS
TYPE type_rec_communties_accounts IS RECORD
(type_community_id community.community_id%type,
type_account_id community_account.account_id%type,
type_start_timestamp_login community_account.start_timestamp_login%type,
type_eind_timestamp_login community_account.eind_timestamp_login%type);
TYPE type_tab_communities_accounts
IS TABLE of type_rec_communties_accounts
INDEX BY pls_integer;
t_communities_accounts type_tab_communities_accounts;
BEGIN
SELECT community_id,account_id,to_timestamp(start_datum_account) as start_timestamp_login, to_timestamp(eind_datum_account) as eind_timestamp_login
BULK COLLECT INTO t_communities_accounts
FROM community
CROSS JOIN community_account
FETCH FIRST 50 ROWS ONLY;
FORALL i_index IN t_communities_accounts.first .. t_communities_accounts.last
SAVE EXCEPTIONS
INSERT INTO community_login_info (community_id,account_id,start_timestamp_login,eind_timestamp_login)
values (select community_id,account_id,start_timestamp_login,eind_timestamp_login
from table(cast(t_communities_accountsas type_tab_communities_accounts)) a);
END mass_add_rij_koppeling_community_login_info;

Optimizing SQL Server Table Valued function with Full Text Search tables

I have been trying to analyse and optimize a runtime spike in a stored procedure in SQL Server 2014.
Background of the DB objects
I found that the stored procedure shows a spike in runtime when there are more number of records in a source table to process.
The stored procedure part that is lagging is a SELECT INTO temp table (say #tempTable2) statement, that joins another temp table (say #tempTable1) and a multi-statement table valued function (say testFunction)
Code format below :
SELECT t.col1, f.col2
INTO #tempTable2
FROM #tempTable1 t
OUTER APPLY testFunction(t.id) f
This function returns results very quick when I tries to test passing a single value like below:
SELECT *
FROM testFunction(100)
But when the actual stored procedure statement runs, it is kind of stuck in the function for hours (e.g: if #tempTable1 has 12K records, taking 9 hours for the above statement to complete)
This function calls different other multi-statement table valued functions from it.
Also it queries from a table (say table1) which has full-text search enabled - uses full-text search functions such as CONTAINS and CONTAINSTABLE.
The table currently has 6 million records.
The full text index on the table has change tracking set to OFF &
Full text index is configured for 3 text columns in the table.
Also, The table has a clustered index on Id (int datatype) column.
This table is truncated and loaded daily and then below statement executes
ALTER FULLTEXT INDEX ON indx_table1 START FULL POPULATION
There is no other maintenance seems to be done for the Full text index table.
Checking sys.fulltext_index_fragments returns only 1 record with below details
status
data_size
row_count
4
465517531
1408231
max full-text crawl range for DB has value 4.
--
I doubt the bad performance of the query is due to the unmaintained Full text search function - but do not have any proof for it.
If anyone has some idea about this, can you please share your thoughts.
Edit:
Inserted script for TVF(testFunction) and function called from it(fnt_testfunction3) in
https://www.codepile.net/pile/6BDJNPA0
The table with full text search used in testFunction is tbl_FullTextSearch

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

Deleting old records from a very big table based on criteria

I have a table (Table A) that contains 300 million records, I want to do a data retention activity on basis of some criteria. So I want to delete about 200M records of the table.
Concerning the performance, I planned to create a new table (Table-B) with the oldest 10M records from Table-A. Then I can select records from Table-B which matches the criteria and will delete it in Table A.
Extracting 10M records from Table-A and loading into Table-B using SQL Loader takes ~5 hours.
I already created indexes and I use parallel 32 wherever applicable.
What I wanted to know is,
Is there any better way to extract from Table-A and to load it in Table-B.
Is there any better approach other than creating a temp table(Table-B).
DBMS: Oracle 10g, PL/SQL and Shell.
Thanks.
If you want to delete 70% of the records of your table, the best way is to create a new table that contains the remaining 30% of the rows, drop the old table and rename the new table to the name of the old table. One possibility to create the new table is a create-table-as-select statement (CTAS), but there are also possibilities that make the impact on the running system much smaller, e.g. one can use materialized views to select the remaining data and convert the materialized vie to a table. The details of the approach depend on the requirements.
This reading and writing is much more efficient then deleting the rows of the old table.
If you delete the rows of the old table it is probably necessary to reorganize the old table which will also end up in writing these remaining 30% of data.
Partitioning the table by your criteria may be an option.
Consider a case with the criteria is the month. All January data falls into the Jan partition. All February data falls into the Feb partition...
Then when it comes time to drop all the old January data, you just drop the partition.
Using rowid best to use but inline cursor can help u more
Insert into table a values ( select * from table B where = criteria) then truncate table A
Is there any better way to extract from Table-A and to load it in? You can use parallel CTAS - create table-b as select from table-a. You can use compression and parallel query in one step.
Table-B. Is there any better approach other than creating a temp
table(Table-B)? Better approach would be partitioning of table a
Probably better approach would be partitioning of Table A but if not you can try fast and simple:
declare
i pls_integer :=0 ;
begin
for r in
( -- select what you want to move to second table
SELECT
rowid as rid,
col1,
col2,
col3
FROM
table_a t
WHERE
t.col < SYSDATE - 30 --- or other criteria
)
loop
insert /*+ append */ into table_b values (r.col1, r.col2, r.col3 ); -- insert it to second table
delete from table_a where rowid = r.rid; -- and delete it
if i < 500 -- check your best commit interval
then
i:=i+1;
else
commit;
i:=0;
end if;
end loop;
commit;
end;
In above example you will move your records in small 500 rows transactions. You can optimize it using collection and bulk insert but i wanted to keep simple code.
I was missing one index on a column that i was using in a search criteria.
Apart from this there was some indexes missing on referenced tables too.
Apart from this #miracle173 answer is also good but we are having some foreign key too that might create problem if we had used that approach.
+1 to #miracle173

How to copy large number of data from one table to another in same database?

I have two tables with same column structure in the same database: TableA and TableB.
TableA doesn't have any indexes, but TableB has a non-clustered unique index.
TableA has 290 Million rows of data that needs to be copied to TableB.
As they both have same structure, I've tried
INSERT INTO TableB
SELECT *
FROM TableA;
It was executing for hours and produced a huge log file that filled the disk. As a result the disk ran out of space and the query was killed.
I can shrink the log file. How can I copy these many rows of data to another table efficiently?
First of all, disable the index on TableB before inserting the rows. You can do it using T-SQL:
ALTER INDEX IX_Index_Name ON dbo.TableB DISABLE;
Make sure to disable all the constraints (foreign keys, check constraints, unique indexes) on your destination table.
Re-enable (and rebuild) them after the load is complete.
Now, there's a couple of approaches to solve the problem:
You have to be OK with a slight chance of data loss: use the INSERT INTO ... SELECT ... FROM ... syntax you have but switch your database to Bulk-logged recovery mode first (read before switching). Won't help if you're already in Bulk-logged or Simple.
With exporting the data first: you can use the BCP utility to export/import the data. It supports loading data in batches. Read more about using the BCP utility here.
Fancy, with exporting the data first: With SQL 2012+ you can try exporting the data into binary file (using the BCP utility) and load it by using the BULK INSERT statement, setting ROWS_PER_BATCH option.
Old-school "I don't give a damn" method: to prevent the log from filling up you will need to perform the
inserts in batches of rows, not everything at once. If your database
is running in Full recovery mode you will need to keep log backups
running, maybe even trying to increase the frequency of the job.
To batch-load your rows you will need a WHILE (don't use them in
day-to-day stuff, just for batch loads), something like the
following will work if you have an identifier in the dbo.TableA
table:
DECLARE #RowsToLoad BIGINT;
DECLARE #RowsPerBatch INT = 5000;
DECLARE #LeftBoundary BIGINT = 0;
DECLARE #RightBoundary BIGINT = #RowsPerBatch;
SELECT #RowsToLoad = MAX(IdentifierColumn) dbo.FROM TableA
WHILE #LeftBoundary < #RowsToLoad
BEGIN
INSERT INTO TableB (Column1, Column2)
SELECT
tA.Column1,
tB.Column2
FROM
dbo.TableA as tA
WHERE
tA.IdentifierColumn > #LeftBoundary
AND tA.IdentifierColumn <= #RightBoundary
SET #LeftBoundary = #LeftBoundary + #RowsPerBatch;
SET #RightBoundary = #RightBoundary + #RowsPerBatch;
END
For this to work effectively you really want to consider creating an
index on dbo.TableA (IdentifierColumn) just for the time you're
running the load.

Resources