Text Manipulation in SQL

Text Manipulation in SQL - sql-server

I have a column named body(ntext,null). Basically anything in the body of the message will come out as one string of text. See example:
Report Count SITE Type ACCOUNT NUMBER STMT CD COLL SCHEME Previously Touched Resi Aging 98 Cleveland - 609 Former 22449903 1 RQ-1 1160201
I want the result to look like this:
Report Count SITE Type ACCOUNT NUMBER STMT CD
98 Cleveland - 609 Former 22449903 1 RQ-1 1160201
How can I get this output? Would it be easier to do in EXCEL using VBA verses SQL?
I am not an expert in SQL. I am still learning.

You COULD try to get this out of Sql but I think most would agree that Sql is not designed for extensively formatting text.
As a DBA, I would steer you towards making those fields discrete if possible using normalization or at the very least having a key/value pair table rather than a blob of text that represents both fields and data.
You could also consider a datatype of XML if you find that you need to store different fields and different responses for each row.

Related

SQL Server Problem with carriage splitting a string

DECLARE #Description2 VARCHAR(MAX);
SELECT #Description2 = '1. Each week there will be a philosophical question to address. For example, you will address questions such as do we have freewill or are we determined?, what does quantum physics tell us about the nature of reality?, and what are the philosophical implications of Darwinian evolution? Utilizing the readings for the week prepare a written essay response. There will be a minimum of ten mini-essays for the term and each must be a minimum of 200 words.
2. Develop an argument on the topic of ontology, focusing specifically on the question "Are we just the brain?" Argue either the materialist position (we are just the brain) or the non-materialist position (we are not just the brain), drawing from the primary writings of the philosophers. Be sure to explain both positions in your essay and then make the case for the position you are supporting. This argumentative essay needs to be at least 750 words in length. We will then conduct an in class debate and you will need to argue your point in a debate setting.
3. Develop an argument in the area of ethics, arguing for or against animal rights. Make sure to utilize primary writings in the construction of your argument. This argumentative essay needs to be at least 750 words in length. Students will present their position to the class in a ten minute oral presentation.
4. Having read the writings of Epictetus and Sartre compare and contrast Stoicism and Existentialism. Write a 750 word essay highlighting the key differences and similarities.
5. Analyze the primary readings of Nietzsche in journal form. Choose 10 separate passages to analyze and include the following: a) a summary of the passage; b) an interpretation or analysis of the argument; c) a comparison and/or contrast to the ideas of another philosopher or philosophy; d) personal insight into the writing by applying the ideas to you or to the world at large (its meaning on a deeper, more personal level). This journal will be at least 1000 words in length. '
set #Description2 = replace(replace(replace(replace(replace(replace(replace(#Description2,'<p>',''),'</p>',''),'<br />',''),' ',''),'<br />.<br />',''),'<div>',''),'</div>','')
set #Description2 = concat('???',REPLACE(#Description2,CHAR(13)+char(10),'???'))
select ltrim(s.Item)
from dbo.DelimitedSplit8K_LEAD(#Description2, '???') s
where ltrim(s.Item) <> ''

So what do you actually want to happen? And what are you seeing that is different from what you want?
In SSMS, if you go to tools > options > query results > SQL Server > results to grid, there is a checkbox labelled "retain CR/LF on copy or save" which determines how varchar data will be treated when you click on the output grid results and copy the data.
If you check the checkbox, the carriage return/linefeed will be retained. Ie, if you do:
select 'a
b'
Run that query, copy the result from the grid, and paste it into, say, notepad, you will get 2 lines of text in notepad.
On the other hand, if you don't check the checkbox, you will only get a single line in notepad.
Be aware that if you change the setting of this checkbox, I believe you will need to open a new query window to see the new behaviour.
Note that this only controls the behaviour of SSMS. It has nothing to do with how your data is "really" stored, for example, if you insert it into a table. If you have an application reading data from a table, the way it formats the output is up to the application.

SQL Server validating postcodes

I have a table containing postcodes but there is no validation built in to the entry form so there is no consistency in the way they are stored in the database, sample below:
ID Postcode
001742 B5
001745
001746
001748 DY3
001750
001751
001768 B276LL
001774 B339HY
001776 B339QY
001780 WR51DD
I want to use these postcode to map the distance from a central point but before I can do that I need to put them into a valid format and filter out any blanks or incomplete postcodes.
I had considered using
left(postcode,3) + ' ' + right(postcode,3)
To correct the formatting but this wouldn't work for postcodes like 'M6 8HD'
My aim is to get the list of postcodes in a valid format but I don't know how to account for different lengths of postcode. Is this there a way to do this in SQL Server?

As discussed in the comments, sometimes looking at a problem the other way around presents a far simpler solution.
You have a list of arbitrary input provided by users, which frequently doesn't contain the correct spacing. You also have a list of valid postcodes which are correctly spaced.
You're trying to solve the problem of finding the correct place to insert spaces into your arbitrary inputs to make them match the list of valid codes, and this is extremely difficult to do in practice.
However, performing the opposite task - removing the spaces from the valid postcodes - is remarkably easy to do. So that is what I'd suggest doing.
In our most recent round of data modelling, we have modelled addresses with two postcode columns - PostCode containing the postcode as provided from whatever sources, and PostCodeNoSpace, a computed column which strips whitespace characters from PostCode. We use the latter column for e.g. searches based on user input. You may want to do something similar with your list of Valid postcodes, if you're keeping it around permanently - so that you can perform easy matches/lookups and then translate those matches back into a version that has spaces - which is actually a solution to the original question posed!

SSIS Expression to Isolate a String

I am building an SSIS package that will populate data from an Excel Spreadsheet into our Database for Reporting.
The customer did not provide an individual column for the City and Unfortunately, the customer cannot update their export file to add the city, so I am trying to build a city column using the Branch Names.
I need an SSIS Expression (or several) to use in a Derived Column Transformation to pull the Name of the Cities out of the Branch Name. The issue I have is that the Spacing and placement of the names varies. I have tried to use Token, Sub string and Right and Left combined with other expressions and I always seem to cut something off.
Has anyone else run into this and how can I fix it. (I am not familiar with C# to use a Script Component).
Here is a Sample of the Data that I have.
Branch Name
JS OMAHA - 09
JS SIOUX FALLS - 48
JS DOWNINGTOWN - 53
JS ST PAUL - 70
JS BLOOMINGTON - 103
JS PITTSBURGH NORTH -149-
JS TINTON FALLS - 186
JS BLAINE - 337
JS ROCHESTER MN - 423

Do you have a list of valid cities sitting in a table? If so you can use a lookup transformation.
Lets say your list if cities is in a table called city
On the General tab pick No Cache
On the Connection tab tab pick the city table
On the Columns tab tab match the Branch Name column to the city column in your city table
In the Advanced tab, tick Modify the SQL statement and change the end to where [Branch Name] Like '%' + ? '%'
Now your lookup will find the closest match and pass it through as an extra column.
The other way is to load it all into a staging table and do an UPDATE, also using LIKE
Whatever you do, it will help to have a list of valid cities in a table
The other way is to make an assumption about the tokens in the data and use string functions in a derived column transformation to extract it out, but you can get some unexpected results.
I can expand further on these if you wish but I won't waste time if you're never going to return to the question.

Whilst you stated that you are not familiar with script components - they are the correct tool for the job. You will get much greater flexibility by using C# (or VB.Net) code to manipulate your strings. There are a number of good tutorials online to show you how to use a script task, and lots of information about string manipulation in C#.

Using text in a column of Date/Time type in access

I have a column in MS Access in which the data could be any of the following:
A date
Text string: "n/a"
Text string: "n/e"
The vast majority of entries will be dates but a very few will need to be these specified text strings. I would like to still be able to perform date calculations on the column. Whats the best datatype to use?

In my opinion the best approach would be to leave the date field as Date/Time and then add another field to indicate the status if the Date/Time field is Null. Something like:
DateField DateStatus
--------- ----------
2014-09-21
n/a
2014-09-23
2014-09-25
n/e
You could use a single Text field, but then any time you wanted to use the field value as a proper Date/Time value you'd have to convert it using CDate(). You would also have the possibility of other junk getting in there, or dates getting entered in different formats (e.g. d/m/yyyy vs. m/d/yyyy). And finally, you would lose the ability to easily determine whether a Date/Time value is in a particular row (which in my approach would simply be ... WHERE DateField IS [NOT] NULL).

I agree with Gord Thompson's answer - mainly because it's so non-intuitive to have, essentially, two completely different types of data in a single column, and because it's going to make validation/data integrity stuff so much harder with little upside - and, as he indicates with the CDate() reference, dates basically only work reliably like dates if they're in a "date/time" field. Microsoft has a page on choosing a data type that explains some of the Access-specific differences in more detail.
I also suggest that you don't actually have a text field for those "comments," since you say there's only a handful of potential options - use a Long Integer and connect back to a separate table with the list of allowable entries. This will allow you to run reports more easily, change the "display text" in one step instead of potentially dozens of times, etc. It also saves a relatively small amount of space per record (long integer = 4 bytes; text = up to 255 bytes.)
You can also do fun data/reporting stuff with that Comment (long integer) field and dates - even combined into ranges, by the way - queries let you use the two different columns to create a single answer. I have a report that's grouped so that you can see stats for everything that's active (by quarter in which they start) plus everything that's pending (with the code indicating who's responsible for watching this record,) plus everything that's not pending but still doesn't have a start date (with the reason code displayed,) plus everything that's expired (by quarter in which they ended.) It looks like each of those things is in a single column in the report, but it's actually like five columns that have been concatenated with the IIf function.
(Almost every argument I can come up with boils down to "this is what relational databases are all about and why they're so awesome.)

Make SQL Server index small numbers

We're using SQL Server 2005 in a project. The users of the system have the ability to search some objects by using 'keywords'. The way we implement this is by creating a full-text catalog for the significant columns in each table that may contain these 'keywords' and then using CONTAINS to search for the keywords the user inputs in the search box in that index.
So, for example, let say you have the Movie object, and you want to let the user search for keywords in the title and body of the article, then we'd index both the Title and Plot column, and then do something like:
SELECT * FROM Movies WHERE CONTAINS(Title, keywords) OR CONTAINS(Plot, keywords)
(It's actually a bit more advanced than that, but nothing terribly complex)
Some users are adding numbers to their search, so for example they want to find 'Terminator 2'. The problem here is that, as far as I know, by default SQL Server won't index short words, thus doing a search like this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"')
is actually equivalent to doing this:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator"') <-- notice the missing '2'
and we are getting a plethora of spurious results.
Is there a way to force SQL Server to index small words? Preferably, I'd rather index only numbers like 1, 2, 21, etc. I don't know where to define the indexing criteria, or even if it's possible to be as specific as that.
Well, I did that, removed the "noise-words" from the list, and now the behaviour is a bit different, but still not what you'd expect.
A search won't for "Terminator 2" (I'm just making this up, my employer might not be really happy if I disclose what we are doing... anyway, the terms are a bit different but the principle the same), I don't get anything, but I know there are objects containing the two words.
Maybe I'm doing something wrong? I removed all numbers 1 ... 9 from my noise configuration for ENG, ENU and NEU (neutral), regenerated the indexes, and tried the search.

These "small words" are considered "noise words" by the full text index. You can customize the list of noise words. This blog post provides more details. You need to repopulate your full text index when you change the noise words file.

I knew about the noise words file, but I'm not why your "Terminator 2" example is still giving you issues. You might want to try asking this on the MSDN Database Engine forum where people that specialize in this sort of thing hang out.

You can combine CONTAINS (or CONTAINSTABLE) with simple where conditions:
SELECT * FROM Movies WHERE CONTAINS(Title, '"Terminator 2"') and Title like '%Terminator 2%'
While the CONTAINS find all Terminator the where will eliminate 'Terminator 1'.
Of course the engine is smart enough to start with the CONTAINS not the like condition.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Text Manipulation in SQL - sql-server

Related

SQL Server Problem with carriage splitting a string

SQL Server validating postcodes

SSIS Expression to Isolate a String

Using text in a column of Date/Time type in access

Make SQL Server index small numbers

Categories

Resources