Searching SQL Server database for control characters - sql-server

I'm looking to search for unwanted control characters in a MSSQL database.
I currently use a stored procedure that gets created against a database I need to search, but this will only work when searching for a simple character or string of characters. See below for the procedure as it stands (This was first gathered from this site)
CREATE PROC SearchAllTables
(
#SearchStr nvarchar(100)
)
AS
BEGIN
-- Creates a Stored Procedure for the database
-- When running the procedure, set the #SearchStr parameter to the character you are searching for
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE #TableName nvarchar(256), #ColumnName nvarchar(128), #SearchStr2 nvarchar(110)
SET #TableName = ''
SET #SearchStr2 = QUOTENAME('%' + #SearchStr + '%','''')
WHILE #TableName IS NOT NULL
BEGIN
SET #ColumnName = ''
SET #TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > #TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0
)
WHILE (#TableName IS NOT NULL) AND (#ColumnName IS NOT NULL)
BEGIN
SET #ColumnName =
(
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(#TableName, 2)
AND TABLE_NAME = PARSENAME(#TableName, 1)
AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
AND QUOTENAME(COLUMN_NAME) > #ColumnName
)
IF #ColumnName IS NOT NULL
BEGIN
INSERT INTO #Results
EXEC
(
'SELECT ''' + #TableName + '.' + #ColumnName + ''', LEFT(' + #ColumnName + ', 3630)
FROM ' + #TableName + ' (NOLOCK) ' +
' WHERE ' + #ColumnName + ' LIKE ' + #SearchStr2
)
END
END
END
SELECT ColumnName, ColumnValue FROM #Results
END
Now, I need to alter this to allow me to search for a list of control characters:
'%['
+ CHAR(0)+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)
+ CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)
+ CHAR(10)+CHAR(11)+CHAR(12)+CHAR(13)+CHAR(14)
+ CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)
+ CHAR(20)+CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)
+ CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)
+ CHAR(30)+CHAR(31)+CHAR(127)
+ ']%',
Now the procedure as it stands won't allow me to use this as a search string, and it won't search correctly even using a single control character e.g. CHAR (28)
USE [DBNAME]
GO
DECLARE #return_value int
EXEC #return_value = [dbo].[SearchAllTables]
#SearchStr = N'CHAR (28)'
SELECT 'Return Value' = #return_value
GO
Removing the N'' from the #SearchStr in the example above results in the error message:
Incorrect syntax near '28'
Can anyone help with a way of adapting this procedure to allow the search of control characters?

I would opt for a dynamic CharIndex(). Consider the following
Declare #ColumnName varchar(25)='[SomeField]'
Declare #SearchFor nvarchar(max) ='CHAR(0),CHAR(1),CHAR(2),CHAR(3),CHAR(4),CHAR(5),CHAR(6),CHAR(7),CHAR(8),CHAR(9),CHAR(10),CHAR(11),CHAR(12),CHAR(13),CHAR(14),CHAR(15),CHAR(16),CHAR(17),CHAR(18),CHAR(19),CHAR(20),CHAR(21),CHAR(22),CHAR(23),CHAR(24),CHAR(25),CHAR(26),CHAR(27),CHAR(28),CHAR(29),CHAR(30),CHAR(31),CHAR(127)'
Set #SearchFor = 'CharIndex('+Replace(#SearchFor,',',','+#ColumnName+')+CharIndex(')+','+#ColumnName+')'
So Your Dynamic where would look something like this
' WHERE ' + #SearchFor + '>0'
Just for illustration, the #SearchFor string would look like this
CharIndex(CHAR(0),[SomeField])+CharIndex(CHAR(1),[SomeField])+...+CharIndex(CHAR(31),[SomeField])+CharIndex(CHAR(127),[SomeField])

It looks like QUOTENAME is what is breaking things for you. When you try to use certain characters - such as char(0) - it returns NULL. Because of this, you are probably better off manually putting the single quotes yourself.
This means you would want to change this part:
INSERT INTO #Results
EXEC
(
'SELECT ''' + #TableName + '.' + #ColumnName + ''', LEFT(' + #ColumnName + ', 3630)
FROM ' + #TableName + ' (NOLOCK) ' +
' WHERE ' + #ColumnName + ' LIKE ' + #SearchStr2
)
to this:
INSERT INTO #Results
EXEC
(
'SELECT ''' + #TableName + '.' + #ColumnName + ''', LEFT(' + #ColumnName + ', 3630)
FROM ' + #TableName + ' (NOLOCK) ' +
' WHERE ' + #ColumnName + ' LIKE ''' + #SearchStr + ''' -- Note the use of #SearchStr (Not #SearchStr2) and the additional quotes to wrap your search string in.
)
Which should allow you to use your %[...]% pattern matching syntax.

Concerns:
Performance
As you probably know, wildcards (%) at the beginning and end of the argument prevent your SARG from using any indexes at all (even if it claims to use an INDEX SCAN) as SQL Server has no idea where the values will be. In a worst case scenario, it might even look in the wrong areas!
More grievous, the last EXEC statement you fire off will make SQL Server run through hoops. Despite what you might think, SQL Server initializes variables at the time of execution. Meaning, the optimizer will be running with its bed-clothes still on while it is in the middle of executing the query plan and may end up changing several times!
An example of what might be unleashed occurred on one of my DBs a
month ago, where a terrible new plugin ran a simple query looking for
one row with just two badly parameterized predicates on a large table of 1
million rows. Yet, the Optimizer swallowed up trillions of IOs in a
matter of seconds (the query came and went too fast for a governor)
and sent 2 billion rows PER QUERY through the network.
Tragically, the issue was zombied that day, and with just 500 one-row
result sets in my database running repeatedly, it brought down our
server.
Isolation and Transactions
Guessing haphazardly, expect to have locking issues and swallowed up resources. Major operations like UPDATES, and REINDEXING, and ALTER statements will either be forced to wait or kick your query to the curbside. Even using READ UNCOMMITTED will not save you from some blocking issues.
A New Approach
All of those characters you listed are neither letters nor numbers, but meaningless garbage (to SQL Server) that flows in from a front end application. I noticed you excluded Microsoft System Tables, so where does your data flow come from and how is it disseminated throughout the database? Who is at fault? How does the system, user, and designer play a role in the mess?
Is this Server an OLTP or READ heavy? Does your org not have a capable SSIS, ETL system to prevent garbage from wreaking havoc on your server?
Database Constraints
Assuredly, what reason does your application fail to pre-cleanse the data before sending it? And when it does get to the database level, why can we not use both the DATA TYPE and TABLE CONSTRAINTS to our advantage? Simple solutions like using DATE instead of VARCHAR for storing dates, adding normalization instead of storing blobs to isolate the read-heavy tables from write-heavy can spell wonders to improvement.
Admittingly, using CHECK CONSTRAINTS can lead to an exponential degradation of performance on your INSERT statements, so you may need to think about the larger impact.
Preventative vs Prescriptive
Ostensibly, I could write a query that would solve your current question (encapsulating EXEC statements in another Stored Proc enables proper parameter sniffing), we need to ask more and write less code. Your Procedure is terrible now and will always be, even if we window-dress. It masks the real issue of how those control characters got there in the first place and forces expensive queries on your poor system.
How relationally your tables work, normalization, cardinality should mean something to you so you can discriminate between not only types of tables but those specific columns they possess. Your current trouble would be disastrous on many of my databases, which can reach 1.5+ Terabytes in size
The more you gather your requirements, the better your answer will be. Heck, even setting up a database entirely for ETL would be superior than your current solution. And even if you still end up running a similar query, at least you will have shortened your list of columns and tables to a minute, understandable list instead of blindly inflicting pain on everyone in your company.
Best of wishes!

Related

Deadlock MS-SQL when Inserting Into a Table

I am using the following query to insert in the respective historical table changes occurred to a given table. I am executing the same query simultaneously for multiple tables in python (changing the table name and database). None of the historical tables have foreign keys. But some of the executions end up in deadlock. Each table have assign a unique historical table. I am not sure how to solve the issue. Is it because I use a variable table with the same name in all the procedures?
declare #name_tab table (name_column varchar(200),
dtype varchar(200))
declare #columns varchar(max)
declare #query varchar(max)
declare #database varchar(200)
declare #table_name varchar(200)
set #database = '%s'
set #table_name = '%s'
insert into #name_tab
select c.name as name_column,
t.name as dtype
from sys.all_columns c
INNER JOIN sys.types t
ON t.system_type_id = c.system_type_id
where OBJECT_NAME(c.object_id) = #table_name
set #columns= stuff((select ','+name_column from #name_tab FOR XML PATH('')),1, 1, '')
set #query= 'insert into ' +#database+'..'+'HISTORY_'+#table_name+' select super_q.* from' +
'(select cast (GETDATE() as smalldatetime) as TIME_MODIFIED, new_info.* from '+
'(SELECT ' + #columns + ' From '+#database+'..'+#table_name +
' except ' +
'SELECT ' + #columns + ' From '+#database+'..'+'HISTORY_'+#table_name + ') new_info) as super_q'
execute(#query)
I got this sample from system_health
It appears that some concurrent process is altering or creating a table at the same time. The deadlock XML should contain additional details about what's going on.
But whatever the actual cause, the solution is simple. Use your scripting above to generate the trigger bodies in static SQL so you don't have to query the catalog for every insert.
Create a procedure in your database called, say, admin.GenerateHistoryTables and one called admin.GenerateHistoryTriggers and run those ahead of time to install the history tables and wire up the triggers.
Or stop re-inventing the wheel and use Change Data Capture or Temporal Tables.

Export Database as Scripts; Sequences not set correctly

I want to take a Database from a SQLServer 2016 and set it up on a 2014 server on a clients' site.
In SSMS I select Tasks => Generate scripts... and get a SQL file containing all CREATE TABLE statements and the like.
I'm under the impression that the sequences are not generated correctly. The sequences are in use and have current values larger than one. However, every CREATE SEQUENCE statement has a START WITH 1 clause.
Can I set somehow, that the sequences get a start value according to the their cuurenty value?
Using system tables (sys.sequences in this case) you can generate a script that alters the current value of all your sequences.
More info on sys.sequences system table can be found here.
First of all run the following script on your SQL Server 2016 database:
DECLARE #sql NVARCHAR(max)=''
SELECT #sql = #sql + ' ALTER SEQUENCE ' + [name]
+ ' RESTART WITH '+ cast ([current_value] AS NVARCHAR(max))
+ CHAR(10) + CHAR(13)
FROM sys.sequences
PRINT #sql
The output should be a list of ALTER SEQUENCE statements that contain the current values for all your sequences; you can now add this statements at the end of the scripts you generated from SSMS.
For example, in my test DB this is the result of the previous script:
The answer by Andrea is not entirely correct. The current_value of sys.sequences contains the value that was already handed out previously. The sequence should start at the next increment value. Some extra casting is needed because the values are stored as sql_variant.
DECLARE #sql NVARCHAR(max)=''
SELECT #sql = #sql + N' ALTER SEQUENCE ' + s.name + N'.' +sq.name
+ N' RESTART WITH '+ cast (cast(sq.current_value as bigint) + cast(sq.increment as bigint) AS NVARCHAR(20))
+ CHAR(10) + CHAR(13)
FROM sys.sequences sq
join sys.schemas s on s.schema_id = sq.schema_id
PRINT #sql

sp_rename failing when called from inside another stored procedure

First post from a self-taught data warehouse guy. I've done lots of searching and reading to get where I am now, but can't get past this sticking point.
Background: as part of our nightly ETL job, we have to copy many tables from many remote DBs (linked servers) into staging-area DBs. After table copies have finished, I continue with the transformation from the staging area DBs into production tables.
Since the remote DBs all have identical schema, I made a stored procedure in the production DB to do the work. The stored procedure accepts parameters of the remote database name and the table name. In the nightly job, SQL Server Agent runs an SSIS package; the package contains one (retry-looping) SSIS task for each remote database; all the tasks run concurrently; each task uses a variable to pass the DB name to SQL file; then the SQL file calls the stored procedure once for each table.
Example remote table and local staging-area table:
Remote: [FLTA].[cstone].[csdbo].[CLIENT]
Local: [FLTAL].dbo.[FLTA CLIENT]
The stored procedure is pretty simple, dropping the old table and using SELECT to make a fresh copy from the remote DB. It looks approximately like this:
CREATE PROCEDURE dbo.spTableCopyNew
(#p VARCHAR(50), #Tablename VARCHAR(50))
AS
-- Drop the existing table
EXEC('IF OBJECT_ID(''[' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +']'', ''U'') IS NOT NULL
DROP TABLE [' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +']'
)
-- Copy the new table
EXEC('SELECT * into [' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +']
FROM [' + #p + '].[cstone].[csdbo].[' + #Tablename +']'
)
GO
The SQL looks roughly like this:
-- Set local variables for the remote server connection, the local database name, and the table prefix
DECLARE #Prefix varchar(50)
-- Accept the variables passed in from the SSIS task
SET #Prefix = ?
-- Copy the two tables
EXEC Datawarehouse.dbo.spTableCopy #Prefix, 'CLIENT'
EXEC Datawarehouse.dbo.spTableCopy #Prefix, 'PATIENT'
Maintenance is a breeze: when we need to grab a new table from all the remote databases, I just add it to the "productionLoad.sql" file.
It works really well...except when it doesn't.
Due to un-figured-out-yet reasons, sometimes a table fails to copy. And since I'm dropping the existing table before copying the new one, this will sometimes break things further down the line. My SSIS tasks will retry up to three times per remote DB, so occasional failures are no big deal. But if the same remote DB has three failures in one night, I'm gonna have a bad time.
My current attempt at a solution is to copy the remote table to a temp table, then ONLY AFTER that copy is successful, drop the local table and rename the temp table to the "real" table. Which brings me to the problem:
I can't get sp_rename to work when called from a stored procedure, to rename tables that exist in a different database than the stored procedure. I've created new variables to resolve expressions, then send those variables to sp_rename, since I can't pass expressions into that stored procedure.
Here's my attempt at a new stored procedure:
CREATE PROCEDURE dbo.spTableCopy
(#p VARCHAR(50), #Tablename VARCHAR(50))
AS
BEGIN
EXEC('USE [' + #p + 'L]')
-- Create variables for schema and table names
-- Since sp_rename can accept variables, but not expresssions containing variables.
DECLARE #RemoteTable VARCHAR(50) = '[' + #p + '].[cstone].[csdbo].[' + #Tablename +']'
DECLARE #LocalTableTemp VARCHAR(50) = '[' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +'_temp]'
DECLARE #LocalTable VARCHAR(50) = '' + #p + ' ' + #Tablename + ''
-- Check for previous temp table and drop it
EXEC('IF OBJECT_ID(''[' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +'_temp]'', ''U'') IS NOT NULL
DROP TABLE [' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +'_temp]'
)
-- Copy the new table
EXEC('SELECT * into ' + #LocalTableTemp + '
FROM ' + #RemoteTable + ''
)
-- Drop the existing table
EXEC('IF OBJECT_ID(''[' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +']'', ''U'') IS NOT NULL
DROP TABLE [' + #p + 'L].dbo.[' + #p + ' ' + #Tablename +']'
)
-- Rename temp table to real table
EXEC sp_rename #LocalTableTemp, #LocalTable
END
GO
This all works when executing it as normal SQL code, but when I make it into a stored procedure, sp_rename fails (everything else works). The final table [FLTAL CLIENT_temp] is there and contains the right data.
sp_rename returns the following error:
Msg 290, Level 16, State 2, Procedure sp_rename, Line 318
Invalid EXECUTE statement using object "Object", method "LockMatchID".
I've fought with this way too long.
Am I just screwing up the syntax?
Can I get sp_rename to work on other DBs with "USE?"
If not, will it work if I make a copy of my sp_tableCopy in every staging-area DB?
If not, will catch-try work inside this stored procedure, even if I call this stored procedure many times concurrently?
What else can I do to recover from failed table copies?
My alternate solution that I haven't pursued yet: after the temp table is successfully created, to TRUNC the existing table and insert everything from the temp table into the real table. That seems messy though.
P.S. Our IT guys are "looking into" the nature of the copy failures.
Try This...
USE
EXEC ..sp_rename '..', '<target_table>'
USE sourcedb
EXEC targetdatabase..sp_rename 'schema.oldtable', 'target_table'

How to search all instance for regex value in SQL Server?

I work in a secure environment and need a way of running a script against our SQL instances i.e. all dbs, tables etc. to search for the use of certain values and to show where they are... is there any way of doing this? I've scoured the net but can't seem to find this!
I've put together this script with help from various sources (vyaskn) but need help in expanding the code to include db's and regex functionality. I just don't have enough experience in using the system views and dynamic SQL to do it myself. It's more important that I get the regex stuff working if searching through all dbs is more difficult.
sp_configure 'clr enabled',1
reconfigure
DECLARE #SearchStr NVARCHAR(100)
-- search for uk phone number for example
SET #SearchStr = '(((\+44)? ?(\(0\))? ?)|(0))( ?[0-9]{3,4}){3}'
CREATE TABLE #Results
(
ColumnName NVARCHAR(370) ,
ColumnValue NVARCHAR(3630)
)
SET NOCOUNT ON
DECLARE #TableName NVARCHAR(256) ,
#ColumnName NVARCHAR(128) ,
#SearchStr2 NVARCHAR(110)
SET #TableName = ''
SET #SearchStr2 = QUOTENAME('%' + #SearchStr + '%', '''')
WHILE #TableName IS NOT NULL
BEGIN
SET #ColumnName = ''
SET #TableName = ( SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.'
+ QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND QUOTENAME(TABLE_SCHEMA) + '.'
+ QUOTENAME(TABLE_NAME) > #TableName
AND OBJECTPROPERTY(OBJECT_ID(QUOTENAME(TABLE_SCHEMA)
+ '.'
+ QUOTENAME(TABLE_NAME)),
'IsMSShipped') = 0
)
WHILE ( #TableName IS NOT NULL )
AND ( #ColumnName IS NOT NULL )
BEGIN
SET #ColumnName = ( SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(#TableName,
2)
AND TABLE_NAME = PARSENAME(#TableName,
1)
AND DATA_TYPE IN ( 'char',
'varchar',
'nchar',
'nvarchar',
'int', 'decimal' )
AND QUOTENAME(COLUMN_NAME) > #ColumnName
)
IF #ColumnName IS NOT NULL
BEGIN
INSERT INTO #Results
EXEC
( 'SELECT ''' + #TableName + '.'
+ #ColumnName + ''', LEFT('
+ #ColumnName + ', 3630) FROM '
+ #TableName + ' (NOLOCK) ' + ' WHERE '
+ #ColumnName + ' LIKE ' + #SearchStr2
)
END
END
END
SELECT ColumnName ,
ColumnValue
FROM #Results
DROP TABLE #Results
--sp_configure 'clr enabled',0
--reconfigure
First you need to find all available SQL instance within the particular network. You can checkout this example for sample code. Once you receive the information about all SQL instances, such as server name, instance name, version, and databases. Now you must have id and password to connect with each database to retrieve any information from any Table present any database. Next create connection with each database and do required search operation.
You can take help of Full-Text Indexing while searching a particular regular expression.
I've had success using one of the tools from EMS to do a regex replace on an Oracle DB schema. By now, I forget which tool I used, but I was able to export the schema to an SQL file and manipulate it as plain text.
You might want to try out the SQL Manager and the DB Extract products. They both have free trials, and both support a wide range of databases.
Once you get your schema, data, and stored procedures into a single SQL file, running a search should be no problem.

SQL Script To Dynamically Coalesce Multiple and Disparately Named Fields Across Multiple DB's

I am attempting to write a single SQL Server script (for Reporting Services) that can run against multiple databases (Reporting Services will determine the DB to run against). The problem is that I have one table in DB that can vary from database to database and change in the number and name of columns. Here is an example of the table that could change from DB to DB.
Database 1:
IDField
FieldA
FieldB
Database 2:
IDField
FieldX
FieldY
FieldX
I now want to write a script that returns the values of the arbitrary fields and coalesce them into one text value (so that Reporting Services only needs to be designed to have a single column for these fields). Here's what I want the output to look like:
When run on Database 1:
"IDFieldValue" | "FieldA:value, FieldB:value"
When run on Database 2:
"IDFieldValue" | "FieldX:value, FieldY:value, FieldZ:value"
I know I could do this with a cursor, but that's very resource and time intensive. I was hoping to do this with straight SQL. Any thoughts as to how?
This will be pretty close to what you're looking for. It uses dynamic sql.
I only spent enough time to provide the concept, but I think once you look it over you'll see where you can modify in order to format just like you want.
Hint: replace YOURTABLENAME with the actual name of the table your running against.
declare #pQuery varchar(max)
declare #pFields varchar(max)
SELECT #pFields = COALESCE(#pFields + '''' + ' ' +column_name + ': ''+ ', '') + 'Cast(' + column_name + ' AS VarChar(max)) + '
FROM information_schema.columns WHERE table_name = 'YOURTABLENAME';
SELECT #pQuery = 'SELECT ' + #pFields + '''' + '''' + ' AS [ReportRow] FROM YOURTABLENAME';
SELECT #pQuery /** just to see what you've created */
exec(#pQuery); /** execute dynamic sql and get back concatenated rows */
This script runs on a single table, but once you see the approach it wouldn't be too much more work to modify the script to build your dynamic sql statement against many tables.
In response to the question in comments:
Don't modify the first select clause. Modify the where clause of the first select statement, and the second select statement (assigning to #pQuery)
SELECT #pFields = COALESCE(#pFields + '''' + ' ' +column_name + ': ''+ ', '') + 'Cast(' + column_name + ' AS VarChar(max)) + '
FROM information_schema.columns WHERE table_name = 'YOURTABLENAME'
AND column_name <> 'IDField' ;
SELECT #pQuery = 'SELECT IDField, ' + #pFields + '''' + '''' + ' AS [ReportRow] FROM YOURTABLENAME';
Hopefully your ID columns are all named the same. If not, you'll have to select from information_schema where table_name = 'yourtablename' and ordinal_position = 1 into a variable and use that in place of the string literals above. In this case, hopefully your ID field is the first field in the table :)

Resources