I'm looking for a way to remove special characters from a field within my Access database. The field has both text and numbers along with dashes, underscores and periods. I want to keep the letters and numbers but remove everything else. There are multiple examples of VB scripts, and some in SQL, but the SQL examples I've seen are very lengthy and do not seem very efficient.
Is there a better way to write a SQL script to remove these characters without having to list each of the special characters such as the below example?
SELECT REPLACE([PolicyID],'-','')
FROM RT_PastDue_Current;
If you are actually manipulating the data and executing code from the context of the MS Access application, then SQL calls can call any public function inside the modules in the MDB. You could write a cleanup function, then
UPDATE Mytable SET MyField=Cleanup(MyField)
Other than that, I have yet to encounter any RDBMS database engine that has much advanced string manipulation features beyond the simple Replace you've mentioned.
Related
Right now, I run a stored procedure whose output feeds a "Create CSV Table" Data Operations component. This component, not surprisingly, outputs a comma-delimited list of fields, which is not supported by our remote system. The fields need to be tab-delimited. One would think that the Data Operations component would have a tab (or other character-delimited option). But no, only commas are available, and no other Data Operations component outputs a tab-delimited table.
Using any mechanism for which we'd have to write code is completely the last option, as there's no need for code to use CSV. Also, any mechanism which requires paying for 3rd party components is categorically out, as is using any solution which is in preview mode.
The only option we've thought of is to revamp the stored procedure which outputs a single "column" containing the tab-delimited columns, and then output to a file - ostensibly, a comma-delimited file, but one without commas embedded inside (which is allowed for my system) so that the single column isn't itself enquoted.
Otherwise, I guess Function Apps is the solution. Anyone with ideas?
The easiest way is to use string function and replace comma with other delimiter. If you could accept this way, after creating the csv table I initiate a string variable with this input replace(body('Create_CSV_table_2'),',',' ').
And this is the result.
And if you don't want this way, yes you have to solve it with code and the Function is a choice.
I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.
E.g.
From the text
"Some title, Some Journal name, vol. 5, p. 20-22"
I want to grab the volume number
, vol\. ([0-9]+), p\. [0-9]+
Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:
The journal/title containing "vol.".
Volume numbers/pages containing letters
"vol" being followed by ":" or ";" instead of "."
...
The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).
(^|ยง|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)
As far as I know, there are two ways of getting Regex functionality into SQL Server.
Through CLR: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/ . Yet, this example (from 2009) does not support groups. Are there any commonly used solutions out there that do?
By installing Master Data Services
Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...
I have found a CLR implementation that is super easy to install, and includes Regex capturing group functions.
http://www.sqlsharp.com/
I have installed this in a separate database called 'SQL#' (simply by using the provided installation .sql script), and the functions are located inside a schema with the same name. As a result I can use the function as follows:
select SQL#.SQL#.RegEx_CaptureGroup( 'test (2005) test', '\((20[012][0-9]|19[5-9][0-9])\)', 1, NULL, 1, -1, '');
Would be nice if this was included by default in SQL Server...
I just came across this when looking into someone else code.
Say there is this schema called Books that has a table call Genres...whenever this schema and table is being used on a script, such as batch/perl it was originally Books..Genres
question is, should it stay like this or changed to Books.Genres? and what is the difference?
First of all, I rarely work outside my default schema and thus rarely ever list the schema name in my SQL statements. Having said that, there are rare occasions when I do need to access more than one schema and only a single dot is used to separate the schema name from the table name. I checked both DB2 and Oracle: neither even allow a double dot. So, unless they are manipulating the SQL in some manner (e.g. maybe the code is processed in a template), SQL statements with a double dot should not work.
MySQL doesn't allow a double dot as separator either; so unless they're preprocessing the SQL in some way as kjpires suggested, this is likely an error. Does the code work?
Is it possible in SQL Server to define a String constant? I am rewriting some queries to use stored procedures and each has the same long string as part of an IN statement [a], [b], [c] etc.
It isn't expected to change, but could at some point in future. It is also a very long string (a few hundred characters) so if there is a way to define a global constant for this that would be much easier to work with.
If this is possible I would also be interested to know if it works in this scenario. I had tried to pass this String as a parameter, so I could control it from a single point within my application but the Stored Procedure didn't like it.
You can create a table with a single column and row and disallow writes on it.
Use that as you global string constant (or additional constants, if you wish).
You are asking for one thing (a string constant in MS SQL), but appear to maybe need something else. The reason I say this is because you have given a few hints at your ultimate objective, which appears to be using the same IN clause in multiple stored procedures.
The biggest clue is in the last sentence:
I had tried to pass this String as a
parameter, so I could control it from
a single point within my application
but the Stored Procedure didn't like
it.
Without details of your SQL scripts, I am going to attempt to use some psychic debugging techniques to see if I can get you to what I believe is your actual goal, and not necessarily your stated goal.
Given your Stored Procedure "didn't like that" when you tried to pass in a string as a parameter, I am guessing the composition of the string was simply a delimited list of values, something like "10293, 105968, 501940" or "Juice, Milk, Donuts" (pay no attention to the actual list values - the important part is the delimited list itself). And your SQL may have looked something like this (again, ignore the specific names and focus on the general concept):
SELECT Column1, Column2, Column3
FROM UnknownTable
WHERE Column1 IN (#parameterString);
If this approximately describes the path you tried to take, then you will need to reconsider your approach. Using a regular T-SQL statement, you will not be able to pass a string of parameter values to an IN clause - it just doesn't know what to do with them.
There are alternatives, however:
Dynamic SQL - you can build up the
whole SQL statement, parameters and
all, then execute that in the SQL
database. This probably is not what
you are trying to achieve, since you
are moving script to stored
procedures. But it is listed here
for completeness.
Table of values -
you can create a single-column table
that holds the specific values you
are interested in. Then your Stored
Procedure can simply use the column
from this table for the IN clause).
This way, there is no Dynamic SQL
required. Since you indicate that
the values are not likely to change,
you may just need to populate the
table once, and use it wherever
appropriate.
String Parsing to
derive the list of values - You can
pass the list of values as a string,
then implement code to parse the
list into a table structure on the
fly. An alternative form of this
technique is to pass an XML
structure containing the values, and
use MS SQL Server's XML
functionality to derive the table.
Define a table-value function that
returns the values to use - I have
not tried this one, so I may be
missing something, but you should be
able to define the values in a
table-value function (possibly using
a bunch of UNION statements or
something), and call that function
in the IN clause. Again - this is an
untested suggestion and would need
to be worked through to determine
it's feasibility.
I hope that helps (assuming I have guessed your underlying quandary).
For future reference, it would be extremely helpful if you could include SQL script showing
your table structure and stored procedure logic so we can see what you have actually attempted. This will considerably improve the effectiveness of the answers you receive. Thanks.
P.S. The link for String Parsing actually includes a large variety of techniques for passing arrays (i.e. lists) of information to Stored Procedures - it is a very good resource for this kind of thing.
In addition to string-constants tables as Oded suggests, I have used scalar functions to encapsulate some constants. That would be better for fewer constants, of course, but their use is simple.
Perhaps a combination - string constants table with a function that takes a key and returns the string. You could even use that for localization by having the function take a 'region' and combine that with a key to return a different string!
I have a table in my MS SQL database where it has some incomplete data in a field. This field in question is a varchar field and has about 1000 characters in the field. This string consists of segmentations of words in the format of a forward slash followed by the segment and then ends with a forward slash (i.e. /p/). Each of these segments would be separated by a space. The problem is that certain of these segmentations do not have the last forward slash (i.e. /p). I need to write a T-SQL script that would correct this problem.
I know I will need to use an update statement to do that. I got the where clause too. But the problem that I have is what am I setting it to equal to. Since the string has about 1000 characters, I don't want to type the actual string and just correct the problematic segmentation. My question is, is there a "RegEx replace function" that would only change problematic segmentations and leave the rest of the string alone?
Your help will be greatly appreciated.
Thanks in advance,
Monte
SQL doesn't support RegEx within it. You could write a SQL CLR function then pipe the data through it and if there's a problem with the data correct it then return the corrected version to SQL.
UPDATE YourTable
Set YourColumn = dbo.YourClrProc(YourColumn)
If you have Windows Scripting Host installed (most machines do), then you can use this method to call into the VBScript.RegExp object from T-SQL.
There is REPLACE, but is nothing close to RegEx.
If this is a one time operation then you can consider exporting the table, use a tool you're familiar with like sed or grep and then import the modified data back. It will probably be faster and more correct than trying to do this in T-SQL.
On the other hand if is a planned maintenance operation you'll need to repeat often as a way to maintain the data, then I concur with mrdenny, a CLR function is probably the best choice.