SQL Server 2012 Full Text Search on RTF - sql-server

I have my Database running on a SQL Server 2012. One Column of my Table contains RTF Text. The Datatype of the Column is nvarchar(MAX).
I want setup a full text search for this column which analyses the rtf and searches only in the real text, so that I don't get rtf Tags as result.
As I understand, parsing rtf should already be part of the SQL Server. But I don't get it working :-(
I did following:
Create a full text catalog
Select the column containing rtf and add a full_text Index
But I still get wrong results
SELECT * FROM myTable WHERE
CONTAINS(myRtfColumn,'rtf')
--> still get all columns, as 'rtf' is a keyword
Any Ideas what I doing wrong? Do I have to activate rtf-Search for my SQL Server or something similar?

A full text search works only on text columns. You are inserting into your database binary stuff -> rtf. When you have chosen nvarchar you told the sql server you want to store text, but you are storing binary stuff. For binary stuff use varbinary(max) instead.
The problem will still remain, because the index routines don't know how to interpret richtext - what are control chars what is content.
let us talk about the interpreter/filter
documentation says:
https://technet.microsoft.com/en-us/en-en/library/ms142531(v=SQL.105).aspx
varbinary(max) or varbinary data
A single varbinary(max) or varbinary column can store many types of documents. SQL Server 2008 supports any document type for which a filter is installed and available in the operative system. The document type of each document is identified by the file extension of the document. For example, for a .doc file extension, full-text search uses the filter that supports Microsoft Word documents. For a list of available document types, query the sys.fulltext_document_types catalog view.
Note that the Full-Text Engine can leverage existing filters that are installed in the operating system. Before you can use operating-system filters, word breakers, and stemmers, you must load them in the server instance, as follows:
Finally todo:
check if ".rtf" is as filter available.
EXEC sp_help_fulltext_system_components 'filter';
then add a calculated column to you table "typ" which always returns ".rtf"
alter table yourname add [Typ] AS (CONVERT([nvarchar](8),'.rtf',0));
This can used now for the index as type specification.

Related

SSIS Datatype Conversion issue

I have build a SSIS Package with FlatfileImport(csv)---DataConversion---Lookup Transformation---OLDDBDestination. This package has erros between DataConversion and SearchTransformation.
After the csv import I try to convert a csv field into decimal because in the DB the field has the format decimal but when I make a connection in the Look up Transformation from csv table to db table, I get an error with datatype is different.
Any idea what the problem is?
Make sure that the SSIS data types of both columns used in the lookup match. The data type resulting from the decimal conversion should be DT_NUMERIC, which corresponds to the SQL Server decimal data type as stated in the mapping chart of the documentation. To verify that the data type of the input column used in the mapping to match in the lookup is also DT_NUMERIC right-click the Lookup and select Show Advanced Editor. After this go to the Input and Output Properties tab, then the Lookup Input node, expand the Input Columns folder below that and highlight the column used in the lookup. The Common Properties window on the right will show the data type. If this is not DT_NUMERIC change the lookup to use a SQL query instead and cast this column as a decimal (SQL Server) data type with a SQL command, then verify that it is now DT_NUMERIC in the Advanced Editor. I'm assuming the Lookup is to a SQL Server database, if not see the other columns in the data mapping chart of the SSIS reference above. You will also want to ensure that the scale and precision is the same for both columns used in the Lookup, which can be viewed on the Advanced editor of the Lookup as well. For the Data Conversion Task, this can be found either on the regular editor or Advanced Editor by going to Input and Output Properties > Data Conversion Output > Output Columns > then select the converted column.

Read BLOB column and convert it to xml in a file

I have a column that is a blob. Here is the screenshot:
As you can see that I have a TBDOCUMENTS table. In this table DOCUMENT column is BLOB. I want to read this column. Then I know that for this particular DOCUMENTURL this column contains xml. So I want to convert it into XML. And then I want to write this XML in a file.
How can I do it in SQL Server? I am using SQL Server 2014
To read the xml you stored inside a blob column (for example varbinary) you can use CONVERT:
select CONVERT(xml,(CONVERT(varbinary(max),DOCUMENT)))
from TBDOCUMENTS
where DOCUMENTURL='...'
Now you can write this string to an XML file. I think that SQL Server is probably not a good fit for this task; nonetheless there are many techniques to achieve this, for example with bcp (more info here and here)
As reported in the comments a DTD related error may occur when using CONVERT:
Parsing XML with internal subset DTDs not allowed. Use CONVERT with style option 2 to enable limited internal subset DTD support.
In this case using the xml-styles option of CONVERT command (more info here) fixes the problem:
select CONVERT(xml,(CONVERT(varbinary(max),DOCUMENT)), 2)
from TBDOCUMENTS
where DOCUMENTURL='...'

SQL Blob to Base 64 in Table for FileMaker

I have looked and found some instances there something similar is being done for websites etc....
I have a SQL table that I am accessing in FileMaker Pro (Through ESS) via an ODBC connection to the SQL database and I have everything I need except there is one field(LNL_BLOB) in one table (duo.MMOBJS) which is an image "(image, null)" which cannot be accessed via the ODBC connection.
What I am hopping to accomplish is find a way that when an image is placed in the field, it is ALSO converted to Base64 in another field in the same table. Also, the database creator has a "View" (Foreign Concept to us Filemaker Developers) with this same data called "dbo.VW_BLOB_IMAGES" if that is helpful.
If there is a field with Base64 text, within FileMaker I can decode it to get the image.
What thoughts do you all have? Is there and even better way?
NOTE: I am using many tables and lots of the data in the app that I have made, this image is not the only reason I have created the ODBC connection.
Table
View
Well, one way to get base64 out of SQL would be to trick the XML engine in SQL to convert your column to base64, then strip out the XML:
SELECT SUBSTRING(Q.Base64Data, 7, LEN(Q.Base64Data)-9)
FROM (SELECT
(
SELECT LNL_BLOB AS B
FROM duo.MMOBJS
FOR XML raw('r'), BINARY BASE64
) AS [Base64Data]) AS [Q]
You'd probably want to add that to your select statement or a view, rather than add it to the table; but, you could write a trigger that would maintain the field using that definition.

How can I use SQL Server to determine the length of a full-text indexed varbinary field?

I have stored a number of binary files in a SQL Server table. I created a full-text-index on that table which also indexes the binary field containing the documents. I installed the appropriate iFilters such that SQL Server can also read .doc, .docx and .pdf files.
Using the function DATALENGTH I can retrieve the length/size of the complete document, but this also includes layout and other useless information. I want to know the length of the text of the documents.
Using the iFilters SQL Server is able to retrieve only the text of such "complicated" documents but can it also be used to determine the length of just the text?
As far as I know (which isn't much), there is no way to query document properties via FTS. I would get the word count before inserting the document into the database, then insert the count along with it, into another column in the table. For Word documents, you can use the Document.Words.Count property; I don't know what the equivalent mechanism is for PDF documents.

How do I save xml exactly as is to a xml database field?

At the moment, if I save <element></element> to a SQL Server 2008 database in a field of type xml, it converts it to <element/>.
How can I preserve the xml empty text as is when saving?
In case this is a gotcha, I am utilising Linq to Sql as my ORM to communicate to the database in order to save it.
What you're asking for is not possible.
SQL Server stores data in xml columns as a binary representation, so any extraneous formatting is discarded, as you found out.
To preserve the formatting, you would have to store the content in a text field of type varchar(MAX) or nvarchar(MAX). Hopefully you don't have to run XML-based queries on the data.
http://msdn.microsoft.com/en-us/library/ms189887.aspx

Resources