Can I use regex capturing groups in SQL Server 2014? - sql-server

I have some text data in an SQL Server 2014 table in which I want to detect complex patterns and extract certain portions of the text if the text matches the pattern. Because of this, I need capturing groups.
E.g.
From the text
"Some title, Some Journal name, vol. 5, p. 20-22"
I want to grab the volume number
, vol\. ([0-9]+), p\. [0-9]+
Mind that I have simplified this use-case to improve readability. The above use-case could be solved without capturing groups. The actual use-case handles a lot more exceptions, like:
The journal/title containing "vol.".
Volume numbers/pages containing letters
"vol" being followed by ":" or ";" instead of "."
...
The actual regex I use is the following (yet, this is not a question on regex structure, just elaborating on why I need capturing groups).
(^|ยง|[^a-z0-9])vol[^a-z0-9]*([a-z]?[0-9]+[a-z]?)
As far as I know, there are two ways of getting Regex functionality into SQL Server.
Through CLR: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/ . Yet, this example (from 2009) does not support groups. Are there any commonly used solutions out there that do?
By installing Master Data Services
Since installing and setting up the entire Master Data Services package felt like overkill to get some Regex functionality, I was hoping there'd be an easy, common way out...

I have found a CLR implementation that is super easy to install, and includes Regex capturing group functions.
http://www.sqlsharp.com/
I have installed this in a separate database called 'SQL#' (simply by using the provided installation .sql script), and the functions are located inside a schema with the same name. As a result I can use the function as follows:
select SQL#.SQL#.RegEx_CaptureGroup( 'test (2005) test', '\((20[012][0-9]|19[5-9][0-9])\)', 1, NULL, 1, -1, '');
Would be nice if this was included by default in SQL Server...

Related

Better Way to Remove Special Characters - Access SQL

I'm looking for a way to remove special characters from a field within my Access database. The field has both text and numbers along with dashes, underscores and periods. I want to keep the letters and numbers but remove everything else. There are multiple examples of VB scripts, and some in SQL, but the SQL examples I've seen are very lengthy and do not seem very efficient.
Is there a better way to write a SQL script to remove these characters without having to list each of the special characters such as the below example?
SELECT REPLACE([PolicyID],'-','')
FROM RT_PastDue_Current;
If you are actually manipulating the data and executing code from the context of the MS Access application, then SQL calls can call any public function inside the modules in the MDB. You could write a cleanup function, then
UPDATE Mytable SET MyField=Cleanup(MyField)
Other than that, I have yet to encounter any RDBMS database engine that has much advanced string manipulation features beyond the simple Replace you've mentioned.

Export Azure SQL Database to XML File

I have searched Google and this site for about 2 hours trying to gather how to do this and no luck on a way that fits/ I understand. As the title says, I need to export table data to an XML file. I have an Azure SQL database with table data.
Table name: District
Table Columns: Id, name, organizationType, address, etc.
I need to take this data and create a XML file that I can save so that it can be given to others.
I have tried using:
SELECT *
FROM dbo.District
FOR XML PATH('districtEntry'), ROOT('leaID')
It gives me the data in XML format, but I don't see a way to save it.
Also, there are some functions I need to be able to perform with the data:
Program should have these options:
1) Export all data.
2) Export all rows created or updated since a specified date.
Files should be named in format ENTITY.DATE.XML, as in
DISTRICT.20150521.XML (use date in YYYYMMDD format).
This leads me to believe I need to write code other than SQL since a requirement would be to query the table for certain data elements as well.
I was wondering if I would need to download any Database Server Data Tools, write code, and if so, in what language, etc. The XML file creation would need to be automated I believe after every update of the table or after a query.
I am very confused and in need of guidance as I now have almost given up hope. Please let me know if I need to clarify anything. Thank you.
P.S. I would have given pictures but I do not have enough reputation to supply them.
I would imagine you're looking to write a program in VB.NET or C#, using ADO.NET in either case. Here's an MSDN article with a complete sample of how to connect to and query SQL Azure:
https://msdn.microsoft.com/en-us/library/azure/ee336243.aspx
The example shows how to write the output to the Console, but you could also write the output similarly using something like a StreamWriter to write it to a file.
You could also create a sqlcmd script to do this, following the guidelines here to connect using sqlcmd:
https://msdn.microsoft.com/en-us/library/azure/ee336280.aspx
Alternatively, if this is a process that does not need to be automated or repeated frequently, you could do it using SSMS:
http://azure.microsoft.com/en-us/documentation/articles/sql-database-manage-azure-ssms/
Running your query through SSMS would produce an XML document, which could be saved using File->Save As

web2py, microsoft sql server and restricted names for fields

Is there any option to set the words "file", "key" and "trigger" as field names for ms-sql server?
We have pretty big application written with web2py with PostgreSQL as db, ans some of the fields has those names. One customer wishes to use ms-sql as db server, And I'm trying not to break compatibility within the DB structure.
Using square brackets (found in google) didn't help (could not use '[file]') - the ms-sql rejected it.
The documentation is pretty clear on this:
Although it is syntactically possible to use SQL Server reserved
keywords as identifiers and object names in Transact-SQL scripts, you
can do this only by using delimited identifiers.
The delimiters used in SQL Server are either double quotes or []. So, you can define them as:
[file]
or
"file"
Note that you need to use the delimiters wherever they appear.
The use of reserved words for such columns is discouraged. However, you might actually have a use case of compatibility between different databases where this capability will be useful.
I don't know why square bracket would fail. It works on SQL Fiddle.
I stumbled into this same issue when trying to use a legacy SQL Server table with fields that don't conform to identifier name rule, such as 'My File'.
Reading the source code I found that a rname field attribute exists for these cases:
Field('my_file', 'string', rname='[My File]'),
With this, Web2Py works with the my_file field name, the actual SQL/DML generated to interact with the database will use [My File] instead.

progress - syntax to modify FIELD

I added field to Progress databyse by
ADD FIELD filedName on TABLEName...
and now I want to change/modify this field (PRECISION or FORMAT or something else...)
What syntax will be correct ? I tried like this:
UPDATE FIELD
MODIFY FIELD
ALTER FIELD
I tried aldo sql notation: alter table
but nothing works.
Could You help me please with syntax to modify field ?
If you are using the 4GL engine (you are using _progres or prowin32 to start a session) then you want to use the "data dictionary" tool to create DDL. You run "dict.p" to access that tool. i.e.: _progres dbName -p dict.p
This will allow you to create tables, define fields and indexes etc. If you want to export the definitions you use the "admin" sub-menu to dump a ".df" file. You can manually edit the output but you need to know what you are doing. It is mostly obvious but it is not documented or supported.
Do NOT imagine that using SQL from within a 4GL session will work. It will not. The 4GL engine internally supports a very limited subset of sql-89. It is mostly there as a marketing ploy. There is nothing but pain and agony down that road. Don't go there. If you are using _progres or prowin32 you are using the 4gl engine.
If you are using SQL92 externally (via sqlexp or some other 3rd party SQL tool that uses an ODBC or JDBC connection) then normal SQL stuff should work but you might want to spend some quality time with the documentation to understand the areas where OpenEdge differs from Oracle or Microsoft or whatever sql dialect you are used to.
Tom, thanks for Your answer.
I use OpenEdge Release 10.1A02 on Linux.
I can make a dump.df file and I can also add new table from file (similar df).
But why I cant modify any added fields ? Ofcorse i can use "p" editor and do it manually from menu Tools/Data Editor/Schema and add new table but it's risky if I tell database administrators to do it manually on each enviroment (specially on production).
if exists syntax:
ADD FIELD filedName on TABLEName...
why there is no
Modify FIELD filedName on TABLEName... ?
Bartek.
Just in case - here are some working examples of .df files in OE 11.3 (could be they are valid in other versions too):
Rename column:
RENAME FIELD "OldName" OF "TableName" TO "NewName"
Other properties:
UPDATE FIELD "FieldName" OF "TableName"
FORMAT "Yes/No"
LABEL "Label"
VALMSG "Validation message..."
Of course the database must be shut down first (apply those changes in single-user mode).

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

Resources