Splitting a VARCHAR column into multiple columns - sql-server

I am struggling to split the data in the column into multiple columns.
I have data of names of customers and the data needs cleaning as there can be duplicates and I also need to set up new standards for the future data.
I have been able to successfully split the first two words in the string but not being able to split further data.
I only have read permissions. So I cannot create any functions.
For example:
Customer name: Illinois Institute of Technology
My query will only fetch "Illinois" in one column and "Institute of Technology" in other column. Considering delimiter as 'space', I am looking to separate each word into separate columns. I am not sure how to identify the 2nd space and further spaces.
I have also tried using 'parsename' function, but I feel it will create more difficulty in cleaning the data.
select name,
left (name, CHARINDEX(' ', name)) as f,
substring(name, CHARINDEX(' ', name)+1, len(name)) as s
from customer

EDIT: This only works for SQL Server 2016 and above. OP has SQL Server 2014.
There isn't really a good way to do this, but here's one method that might work for you, modified from an example here:
create table #customer (id int, name nvarchar(max))
insert into #customer
values (1, 'Illinois Institute of Technology'),
(2, 'The City University of New York'),
(3, 'University of the District of Columbia'),
(4, 'Santa Fe University of Art and Design')
;
with c as(
select id, name
,value
,row_number() over(partition by id order by (select null)) as rn
from #customer
cross apply string_split(name, ' ') as bk
)
select id, name
,[1]
,[2]
,[3]
,[4]
,[5]
,[6]
from c
pivot(
max(value)
for rn in([1],[2],[3],[4],[5],[6])
) as pvt
drop table #customer
Notice a few things:
You have to explicitly declare columns in the output. You could create some overly-complex dynamic SQL that would generate as many column names as you need, but that makes it harder to fix issues and make modifications, and you probably won't get the same query optimisations.
Because of (1), you will just end up dropping words if there are too many to fit the number of columns you've defined. See the last example, id=4.
Beware of other methods that might not keep your words in order, or that skip out duplicate words, eg. "of" in the example id=3.

You don't mention what you plan to do with the data once you retrieve it. Since you only have read permissions you can't store it in a table. Something you may not have thought of is to create a local database where you do have write permissions and do your work there. The easiest way would be to get a copy of the database, but you could also access the readonly database using fully qualified names.
As for your string-splitting needs, I can point you to a great way of splitting strings created by a gentleman named Jeff Moden. You can find an article discussing it as well as a link to code here:
Tally OH! An Improved SQL 8K “CSV Splitter” Function
While it's a very informative read, inasmuch as much of it is about performance testing and such, you might want to skip much of it and go straight for the code, but try to pick out the stuff which discusses functionality and read that because the code will strike the uninitiated as unconventional at best.
The code creates a function but because you don't have permission to do that you will have to remove the meat from the function and use it directly.
I'll try to provide a little overview of the approach to get you started.
The heart of the approach is a Tally Table. If you are unfamiliar with that term (it is also called a Numbers table), it is basically a table in which every row contains an integer and the rows are basically a set of all integers in some range, usually pretty large. So how does a Tally Table help with splitting strings? The magic happens by joining the tally table to the table containing the strings to be split and using the where clause to identify the delimiters by looking at 1-character substrings indexed by the tally table numbers. The natural set-based operations of SQL Server then search for all of your delimiters in one go and your select list then extracts the substrings bracketed by the delimiters. It's really quite clever and very fast.
When you get into the code, the first part of the function may look very strange (because it is), but it is necessary since you only have read rights. It is basically using SQL Server's Common Table Expression (CTE) functionality to build an internal tally table on the fly using some ugly logic which you don't really need to understand (but if you want to dig in, it's clever even if it is ugly). Since the table is only local to the query it won't violate your readonly permissions.
It also uses CTEs to represent the starting index and length of the delimited substrings so the final query is pretty simple, yielding rows with a row number followed by a string split from the original data.
I hope this helps you with your task -- it really is a nice tool to have in your toolkit.
Edit: I just realized that you wanted your output to be in separate columns as opposed to rows. That's quite a bit more difficult since each of the columns you are splitting might produce a different number of strings and also, your columns will need names. If you already know the column names and know the number of output strings it will be easier but still tricky. The row data from the splitter could be tweaked to provide an identifier for the row where the data originated and the row numbers might help with creating arbitrary column names if you need them, but the big problem is that with only read privileges you will find processing things in steps rather tricky -- CTEs can be employed even further for this but your code will likely get rather messy unless the requirements are pretty simple.

Related

SQL query to search for patterns against a specific string

I am currently dealing with a table containing a list of hundreds of part number patterns for discount purposes.
For example:
1) [FR]08-[01237]0[67]4E-%
2) _10-[01]064[CD]-____
3) F12-[0123]0[67]4C-%
I have a string search criteria: F10-1064C-02TY and I am trying to find out which pattern(s) matches that particular string. For this example my query would return the second pattern. The objective is to find the correct part discount based on the matched pattern(s).
What is the best approach in handling this type of problem? Is there a simple and elegant approach or does this involve some complex TSQL procedure?
The pattern for the right side of a LIKE clause can be any expression, which includes values from a table. This means you can use your patterns table in a query i.e.
SELECT PatternId, Pattern
FROM Patterns
WHERE 'F10-1064C-02TY' LIKE Pattern
You can build from here - if your part numbers are stored in a different table, join to that table using the LIKE clause as a join criterion, or build a procedure that takes a part number parameter, or whatever else fits your requirements.

Select query within more than 150 different conditions in where clause

I have a table in my SQL Server database which has more than 400000 rows and I want to select the full names that starts with several names that are in a .txt file approximately more than 150 name, so how would the query will be inside my command in C# .. I could write it in this way but it will be too long and may create a delay or some kind of bugs !
select *
from tableName
where fullName like '%Jack%'
or fullName like '%Wathson%'
--.... and so on
First, SQL Server can handle very long queries. I have created queries that are at least 150k characters, and they work without problem. The limit is considerably larger than that.
Second, you are correct that a bunch of like statements is going to take a long time. There is overhead to like.
Third, your patterns do not conform to your statement. If you want names that start with a particular pattern, then remove the wildcard from the beginning of the pattern. This has the added benefit that SQL Server can use a regular index on FullName for the match.
Finally, if you are really looking at initial strings, then you might want to consider a full text index (here is one place to start). These are usually more efficient than using like.

Is CAST necessary when writing non-varchar columns to a text column?

I have a trigger that takes the columns (and their values) from the inserted table and inserts them as text in an audit table, example:
INSERT INTO audit
(tablename, changes)
SELECT
'mytable',
'id=' + cast(id as nvarchar(50) + ';name=' + name + ';etc...'
FROM
inserted
I have large tables with most columns being non-varchar. In order to concatenate them into a string I need to cast each and every column.
Is it necessary to do so? Is there a better way?
The second (unmarked answer) in this question concatenates the values smartly using xml and cross apply.
Is there a way to expand it to include the column names so that the final result would be:
'id=1;name=myname;amount=100;' etc....
Yes, you need to cast non-character data to a string before you can concatenate it. You might want to use CONVERT instead for data that is susceptible to regional formatting (I'm specifically thinking dates here) to ensure you get a deterministic result. You also need to handle nullable columns, i.e. ISNULL(CAST(MyColumn AS VARCHAR(100)), '') - if you concatenate a NULL it will NULL the whole string.
Why not just save it as xml?
select * from inserted for xml auto
It usually takes less space than a simple text column, and you can treat it as (relatively) normalized data. And most importantly, you don't have to handle converting all the complicated stuff manually (how does your code handle end-lines, quotes...?).
In fact, you can even add indices to xml columns, so it can even be practical to search through. Even without indices, it's going to be much faster searching e.g. all records changed in mytable that set name to some value.
And of course, you can keep the same piece of code for all your audited tables - no need to keep them in sync with the table structures etc. (unless you want to select only some explicit columns, of course).

Creating SQL Server JSON Parsing/Query UDF

First of all before I get into the question, I'll preface this with the fact that I know that this is a "bad" idea. But for business reasons it is something that I have to come up with a solution to, and I'm hoping that someone, somewhere might have some ideas on how to go about this.
I have a SQL Server 2008 R2 table that has a "OtherProperties" column. This column contains various other, somewhat arbitrary additional pieces of information that relate to the records. There is a business need to create a UDF that we can use to query the results, for example.
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "LinkedOrder[0]") IS NOT NULL
This would find a record where there was an array of LinkedOrder entries that contained a value at index 0
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "SubOrder.OrderId") = 25
This would find a property "orderId" and use its value in a comparison.
Anyone seen an implementation of this? I've seen implementations of functions. Like this JSONParser that take the values into a table which just will not get us what we need query wise. Complexity wise, I don't want to write a full fledged JSON parser, but I can if I need to.
Not sure if this will suit your needs but I read about a CLR JSON serializer/deserializer. You can find it here, http://www.sqlservercentral.com/articles/CLR/74160/
It's been a long time since you asked your question but there is now a solution you can use - JSON Select which provides various functions for different datatypes, for example the JsonInt() function. From one of your examples (assuming OrderId is an int, if not you could use a different function):
SELECT *
FROM MyTable
WHERE dbo.JsonInt(myTable.OtherProperties, 'SubOrder.OrderId') = 25
DISCLOSURE:
I am the author of JSON Select, and as such have an interest in you using it :)
If you cannot use SQL Server 2016 with built-in JSON support, you would need to use CLR e.g. JSONselect, json4sql, or custom code such as http://www.codeproject.com/Articles/1000953/JSON-for-SQL-Server-Part, etc.

Is there an implementation of Tom Kyte's STRAGG function supporting SQL Server?

STRAGG function implemention returns a result as a single column value. Implementation for Oracle seems pretty generic and can be consumed for different tables (and relationships). Could similar behavior be achieved for SQL Server. A search in the web, appears to return only hard coded implementations and not a generic one. Do we have any known solution for Sql server?
There is a nice XML solution to this that is widely used. It's simplest if the strings you're aggregating have no XML-invalid or XML-special strings, and here's an example.
SELECT *
FROM
(
SELECT x AS [data()]
FROM
(
SELECT 'something'
UNION ALL
SELECT 'something else'
UNION ALL
SELECT 'something & something'
) y (x)
FOR XML PATH('')
) z (final)
This example is from Tony Rogerson's post at http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/07/06/871.aspx
You can do much more than this simple example shows. You can specify the ordering of the items within the aggregates (put ORDER BY in the derived table), you can group and join so you get more than one result row, you can change delimiters, and so on. Here are a couple of other links about this technique:
http://blogs.technet.com/wardpond/archive/2008/03/15/database-programming-the-string-concatenation-xml-trick-revisited-or-adam-is-right-but-we-can-fix-it.aspx
http://web.archive.org/web/20150328021904/http://sqlblog.com/blogs/adam_machanic/archive/2009/05/31/grouped-string-concatenation-the-winner-is.aspx
Anith Sen did what I think is the most comprehensive answer to this question for SQL Server in his article
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/
Concatenating Row Values in Transact-SQL 31 July 2008. You'll see that there are a number of different techniques for doing this but I reckon the XML trick is the fastest. Before SQL Server 2005, we used variations of the UPDATE trick that I show as a comment in the article.

Resources