Is CAST necessary when writing non-varchar columns to a text column? - sql-server

I have a trigger that takes the columns (and their values) from the inserted table and inserts them as text in an audit table, example:
INSERT INTO audit
(tablename, changes)
SELECT
'mytable',
'id=' + cast(id as nvarchar(50) + ';name=' + name + ';etc...'
FROM
inserted
I have large tables with most columns being non-varchar. In order to concatenate them into a string I need to cast each and every column.
Is it necessary to do so? Is there a better way?
The second (unmarked answer) in this question concatenates the values smartly using xml and cross apply.
Is there a way to expand it to include the column names so that the final result would be:
'id=1;name=myname;amount=100;' etc....

Yes, you need to cast non-character data to a string before you can concatenate it. You might want to use CONVERT instead for data that is susceptible to regional formatting (I'm specifically thinking dates here) to ensure you get a deterministic result. You also need to handle nullable columns, i.e. ISNULL(CAST(MyColumn AS VARCHAR(100)), '') - if you concatenate a NULL it will NULL the whole string.

Why not just save it as xml?
select * from inserted for xml auto
It usually takes less space than a simple text column, and you can treat it as (relatively) normalized data. And most importantly, you don't have to handle converting all the complicated stuff manually (how does your code handle end-lines, quotes...?).
In fact, you can even add indices to xml columns, so it can even be practical to search through. Even without indices, it's going to be much faster searching e.g. all records changed in mytable that set name to some value.
And of course, you can keep the same piece of code for all your audited tables - no need to keep them in sync with the table structures etc. (unless you want to select only some explicit columns, of course).

Related

Splitting a VARCHAR column into multiple columns

I am struggling to split the data in the column into multiple columns.
I have data of names of customers and the data needs cleaning as there can be duplicates and I also need to set up new standards for the future data.
I have been able to successfully split the first two words in the string but not being able to split further data.
I only have read permissions. So I cannot create any functions.
For example:
Customer name: Illinois Institute of Technology
My query will only fetch "Illinois" in one column and "Institute of Technology" in other column. Considering delimiter as 'space', I am looking to separate each word into separate columns. I am not sure how to identify the 2nd space and further spaces.
I have also tried using 'parsename' function, but I feel it will create more difficulty in cleaning the data.
select name,
left (name, CHARINDEX(' ', name)) as f,
substring(name, CHARINDEX(' ', name)+1, len(name)) as s
from customer
EDIT: This only works for SQL Server 2016 and above. OP has SQL Server 2014.
There isn't really a good way to do this, but here's one method that might work for you, modified from an example here:
create table #customer (id int, name nvarchar(max))
insert into #customer
values (1, 'Illinois Institute of Technology'),
(2, 'The City University of New York'),
(3, 'University of the District of Columbia'),
(4, 'Santa Fe University of Art and Design')
;
with c as(
select id, name
,value
,row_number() over(partition by id order by (select null)) as rn
from #customer
cross apply string_split(name, ' ') as bk
)
select id, name
,[1]
,[2]
,[3]
,[4]
,[5]
,[6]
from c
pivot(
max(value)
for rn in([1],[2],[3],[4],[5],[6])
) as pvt
drop table #customer
Notice a few things:
You have to explicitly declare columns in the output. You could create some overly-complex dynamic SQL that would generate as many column names as you need, but that makes it harder to fix issues and make modifications, and you probably won't get the same query optimisations.
Because of (1), you will just end up dropping words if there are too many to fit the number of columns you've defined. See the last example, id=4.
Beware of other methods that might not keep your words in order, or that skip out duplicate words, eg. "of" in the example id=3.
You don't mention what you plan to do with the data once you retrieve it. Since you only have read permissions you can't store it in a table. Something you may not have thought of is to create a local database where you do have write permissions and do your work there. The easiest way would be to get a copy of the database, but you could also access the readonly database using fully qualified names.
As for your string-splitting needs, I can point you to a great way of splitting strings created by a gentleman named Jeff Moden. You can find an article discussing it as well as a link to code here:
Tally OH! An Improved SQL 8K “CSV Splitter” Function
While it's a very informative read, inasmuch as much of it is about performance testing and such, you might want to skip much of it and go straight for the code, but try to pick out the stuff which discusses functionality and read that because the code will strike the uninitiated as unconventional at best.
The code creates a function but because you don't have permission to do that you will have to remove the meat from the function and use it directly.
I'll try to provide a little overview of the approach to get you started.
The heart of the approach is a Tally Table. If you are unfamiliar with that term (it is also called a Numbers table), it is basically a table in which every row contains an integer and the rows are basically a set of all integers in some range, usually pretty large. So how does a Tally Table help with splitting strings? The magic happens by joining the tally table to the table containing the strings to be split and using the where clause to identify the delimiters by looking at 1-character substrings indexed by the tally table numbers. The natural set-based operations of SQL Server then search for all of your delimiters in one go and your select list then extracts the substrings bracketed by the delimiters. It's really quite clever and very fast.
When you get into the code, the first part of the function may look very strange (because it is), but it is necessary since you only have read rights. It is basically using SQL Server's Common Table Expression (CTE) functionality to build an internal tally table on the fly using some ugly logic which you don't really need to understand (but if you want to dig in, it's clever even if it is ugly). Since the table is only local to the query it won't violate your readonly permissions.
It also uses CTEs to represent the starting index and length of the delimited substrings so the final query is pretty simple, yielding rows with a row number followed by a string split from the original data.
I hope this helps you with your task -- it really is a nice tool to have in your toolkit.
Edit: I just realized that you wanted your output to be in separate columns as opposed to rows. That's quite a bit more difficult since each of the columns you are splitting might produce a different number of strings and also, your columns will need names. If you already know the column names and know the number of output strings it will be easier but still tricky. The row data from the splitter could be tweaked to provide an identifier for the row where the data originated and the row numbers might help with creating arbitrary column names if you need them, but the big problem is that with only read privileges you will find processing things in steps rather tricky -- CTEs can be employed even further for this but your code will likely get rather messy unless the requirements are pretty simple.

Iterative UPDATE loop in SQL Server

I would really like to find some kind of automation on this issue I am facing;
A client has had a database attached to their front end site for a few years now, and until this date has been inputting certain location information as a numeric code (i.e. County/State data).
They now would like to replace these values with their corresponding nvarchar values. (e.g Instead of having '8' in their County column, they want it to read 'Clermont County' etc etc for upwards of 90 separate entries).
I have been provided with a 2-column excel sheet, one with the old county numeric code and one with the text equivalent they request. I have imported this to a temp table, but cannot find a fast way of iteratively matching and updating these values.
I don't really want to write a 90 line CASE WHEN paragraph and type out each county name manually. Opens doors for human error etc.
Is there something much simpler I don't know about what I can do here?
I realize that it might be a bit late, but just in case someone else is searching and comes across this answer...
There are two ways to handle this: In Excel, or in SQL Server.
1. In Excel
Create a concatenated string in one of the available columns that meets your criteria, i.e.
=CONCATENATE("UPDATE some_table SET some_field = '",B2,"' WHERE some_field = ",A2)
You can then auto-fill this column all the way down the list, and thus get 90 different update statements which you can then copy and paste into a query window and run. Each one will say
UPDATE some_table SET some_field = 'MyCounty' WHERE some_field = X
Each one will be specific to a case; therefore, you can run them sequentially and get the desired result, or...
2. In SQL Server
If you can import the data to a table then all you need to do is write a simple query with a JOIN which handles the case, i.e.
UPDATE T1
SET T1.County_Name = T2.Name
FROM Some_Table T1 -- The original table to be updated
INNER JOIN List_Table T2 -- The imported table from an Excel spreadsheet
ON T1.CountyCode = T2.Code
;
In this case, Row 1 of your original Some_Table would be joined to the imported data by the County_Code, and would update the name field with the name from that same code in the imported data, which would give you the same result as the Excel option, minus a bit of typing.

TSQL - Get maximum length of data in every column in every table without Dynamic SQL

Is there a way to get maximum length of data stored in every column in the database? I have seen some solutions which used Dynamic SQL, but I was wondering if it can be done with a regular query.
Yes, Just query the INFORMATION_SCHEMA.COLUMNS view for the database, you can get the information out from all columns of all tables in the database if you desire, see the following for more details:
Information_Schema - COLUMNS
If you are talking about the length of particular data in and not the declared length of a column, I am afraid that is not achievable without dynamic SQL.
The reason is that there is only way to retrieve data, and that is the SELECT statement. This statement however requires an explicit column, which is part of the statement itself. There is nothing like
-- This does not work
select col.Data
from Table
where Table.col.Name='ColumnName'
So the answer is: No.

How can I generate an INSERT script for a table with a VARBINARY(MAX) field?

I have a table with a VARBINARY(MAX) field (SQL Server 2008 with FILESTREAM)
My requirement is that when I go to deploy to production, I can only supply my IT team with a group of SQL scripts to be executed in a certain order. A new table I am making in production has this VARBINARY(MAX) field. Usually with new tables, I will script out the CREATE TABLE script. And, if I have data I need to go with it, I will then script out the INSERT scripts. Not too complicated.
But with VARBINARY(MAX), the Stored Procedure I was using to generate the INSERT statements fails on that table. I tried selecting that field, printing it, copying it, converting to hex, etc. The main issue I have with that is that it doesn't select all the data in the field. I do a check DATALENGTH([FileColumn]) and if the source row contains 1,004,382 bytes, the max I can get the copied or selected data when inserting again is 8000. So basically it is truncated (i.e. invalid) data.....
How can I do this better? I tried Googling this like crazy but I must be missing something. Remember, I can't access the filesystem. This has to be all scripted.
If this is a one time (or seldom) thing to do, you can try scripting the data out from the SSMS Wizard as described here:
http://sqlblog.com/blogs/eric_johnson/archive/2010/03/08/script-data-in-sql-server-2008.aspx
Or, if you need to do this frequently and want to automate it, you can try the SQL# SQLCLR library (which I wrote and while most of it is free, the function you need here is not). The function to do this is DB_DumpData and it also generates INSERT statements.
But again, if this is a one time or infrequent task, then try the data export wizard that is built into Management Studio. That should allow you to then create the SQL script that you can run in Production. I just tested this on a table with a VARBINARY(MAX) field containing 3,365,964 bytes of data and the Generate Scripts wizard generated an INSERT statement with the entire hex string of 6.73 million characters for that one value.
UPDATE:
Another quick and easy way to do this in a manner that would allow you to copy / paste the entire INSERT statement into a SQL script and not have to bother with BCP or SSMS Export Wizard is to just convert the value to XML. First you would CONVERT the VARBINARY to VARCHAR(MAX) using the optional style of "1" which gives you a hex string starting with "0x". Once you have the hex string of the binary data you can concatenate that into an INSERT statement and that entire thing, when converted to XML, can contain the entire VARBINARY field. See the following example:
DECLARE #Binary VARBINARY(MAX) = CONVERT(VARBINARY(MAX),
REPLICATE(
CONVERT(NVARCHAR(MAX), 'test string'),
100000)
)
SELECT 'INSERT INTO dbo.TableName (ColumnName) VALUES ('+
CONVERT(VARCHAR(MAX), #Binary, 1) + ')' AS [Insert]
FOR XML RAW;
Don't script from SSMS
bcp the data out/in, or use something like SSMS tools to generate INSERT statements
It more than a bit messed up, but in the past and on the web I've seen this done using a base64-encoded string. You use an xml value to wrap the string and from there you can convert it to a varbinary. Here's an example:
http://blogs.msdn.com/b/sqltips/archive/2008/06/30/converting-from-base64-to-varbinary-and-vice-versa.aspx
I can't speak personally to how effective or performant this is, though, especially for large values. Because it is at best an ugly hack, I'd tuck it away inside a UDF somewhere, so that if a better method is found you can update it easily.
I have never tried anything like this before, but from the documentation for SQL Server 2008 R2, it sounds like using SUBSTRING will work to get the entire varbinary value, although you may have to work with it in chunks, using UPDATEs with the .WRITE clause to append the data.
Updating Large Value Data Types
Use the .WRITE (expression, #Offset, #Length) clause to perform a partial or full update of varchar(max), nvarchar(max), and varbinary(max) data types. For example, a partial update of a varchar(max) column might delete or modify only the first 200 characters of the column, whereas a full update would delete or modify all the data in the column.
For best performance, we recommend that data be inserted or updated in chunk sizes that are multiples of 8040 bytes.
Hope this helps.

Bulk add quotes to SQL values

Not sure if this is an appropriate "programming" issue but its relevant.
Is there a way via SQL or excel even to add quotes to a big number of values, for example i want to add values a,b,c,d,e,f,g,j etc etc to a table is there a way i can automatically add quotes to them? as in 'a','b' etc...
i have this issue especially in my select * from table where column in ('value1','value2')...
Thanks...
I usually tackle this sort of issue using Excel - if you enter your source values in a column, you can then just use an Excel formula to concatenate the values with quotes around them (eg =CONCATENATE("'", A1, "'")), and even extend this to build the complete SQL statement.
Not sure if this helps or is what you were asking, but for such queries you should use parameters, i.e. something like
SELECT ... WHERE column IN (?,?)
This is not only an easy approach, but also means that if somebody puts a single-quote in the value they give you, the form of your query can't be altered - that's how SQL Injection security attacks happen.
If you have values in the same column of another table you coud use a select query
select '''' + cast(ValueColumn as nvarchar) + '''' as QuotedVal from MyTable
and then do your copy/paste

Resources