fulltext index returning no results from pdf filestream - sql-server

I have a filestream table running on SQL Server 2012 on a Windows 8.1 x64 machine, which already have a few PDF and TXT files stored, so I decided to create a fulltext index to search through these files by using the following command:
CREATE FULLTEXT CATALOG FileStreamFTSCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON storage
(FileName Language 1046, File TYPE COLUMN FileExtension Language 1046)
KEY INDEX PK__storage__3214EC077DADCE3C
ON FileStreamFTSCatalog
WITH CHANGE_TRACKING AUTO;
Then I sent these commands after reading some people having the same problem as me:
EXEC sp_fulltext_service #action='load_os_resources', #value=1;
EXEC sp_fulltext_service 'verify_signature', 0;
EXEC sp_fulltext_service 'update_languages';
Exec sp_fulltext_service 'ft_timeout', 600000;
Exec sp_fulltext_service 'ism_size',#value=16;
EXEC sp_fulltext_service 'restart_all_fdhosts';
EXEC sp_help_fulltext_system_components 'filter';
reconfigure with override
I can see the PDF IFilter configured
filter .pdf E8978DA6-047F-4E3D-9C78-CDBE46041603 C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll 11.0.1.36 Adobe Systems, Inc.
and I can even do a
select * from storage
where contains(*, 'data')
but it's returning only the TXT files indexed, so I'm wondering: is there anything else I need to do to start indexing my PDFs? Or is it necessary to create another table and reinsert all these PDFs which I already had stored, even though the TXT files are getting indexed justfined?
UPDATE 1:
Opening the SQLFTXXX.LOG I get this message (for the FileTable):
2014-08-20 06:32:09.48 spid29s Warning: No appropriate filter was found during full-text index population for table or indexed view '[text_storage].[dbo].[storage_table]' (table or indexed view ID '355584405', database ID '7'), full-text key value '篰磧'. Some columns of the row were not indexed.
And this one (for the FileStream table):
2014-08-19 22:14:50.58 spid20s Warning: No appropriate filter was found during full-text index population for table or indexed view '[text_storage].[dbo].[storage]' (table or indexed view ID '674101442', database ID '7'), full-text key value '1797'. Some columns of the row were not indexed.

I ran into the same problem. I have a filestream table on SQL Server 2012 Standard populated with PDFs. I downloaded Adobe's iFilter 11 and created a full text index on the PDFs. I was not able to make it work in production--the filestream table was populated, but full text search was not, and this error occurred in the log: (SQL Server Log folder, SQLFTxxxxx.LOG):
Warning: No appropriate filter was found during full-text index population for table or indexed view
It turned out that the archive bit on the files was set to on. When I turned it off, the full text search populated and searches started to work.
Hope this helps someone else. Also, if you have insight into why it works this way, please let us know. From researching the archive bit, it appears that it indicates that the file is new or changed and in need of a backup. Thanks!

There is another possible fix to this problem; installing some versions of Acrobat or Reader can break the PDF iFilter. Adobe posts this workaround:
https://helpx.adobe.com/acrobat/kb/pdf-search-breaks-110-install.html
Solution
Do one of the following:
Update to Acrobat/Reader 11.0.4 or higher. The issue is fixed in version 11.0.4.
PDF iFilter 9 is not supported on Windows 8, update to PDF iFilter 11 from here.
If you cannot update your Acrobat/Reader or PDF iFilter, here is the workaround.
Workaround: Restore the registry entry to the Windows 8 native entry as follows:
Go to HKEY_CLASSES_ROOT\.pdf\PersistentHandler. Create the key if it does not exist.
Verify that the value is 1AA9BF05-9A97-48c1-BA28-D9DCE795E93C. If the Acrobat or Reader install overwrote the entry with F6594A6D-D57F-4EFD-B2C3-DCD9779E382E, return it to its original value.
If you have any third-party PDF iFilters installed, reinstall them.
Restart the Windows Search service:
Go to Task Manager > Services.
Select WSearch.
Right-click, and then choose Restart.

I had to use Adobe iFilter 9 for sql server 2014 and 2017.
ftp://ftp.adobe.com/pub/adobe/acrobat/win/9.x/PDFiFilter64installer.zip

I've finally found a solution, after trying both Adobe and Foxit Ifilter with the same error message, I found this other Ifilter called "PDFlib", I downloaded it and followed its instructions to make it available to SQL Server, rebuilt the index and now my pdfs are indexed and can be searched.
I believe that if I follow these same instructions for the other ifilters they will work as well, gonna try that after I'm done with my tests and update with the results.

Related

Add a Word docx as a DEFAULT to a varbinary(max) column

I would like to store MS Word documents (docx) in SQL Server by default. What I have been able to do so far is create a varbinary(max) column in SQL Server, link to this table via MS Access, and drag Word documents into this column via MS Access (in Access it is coming up as OLE Object column). This is great and the files were saved/uploaded successfully, but ideally I would like these documents in this table by default, or when the record is created. I hope that's clear.
Here is what I've tried...
I tried to add this document via MS Access (as mentioned above). This worked fine. I copied the blob text in this column (the 0x0151... text below), and created the following contraint:
ALTER TABLE [dbo].[CM_PROJECTS] ADD CONSTRAINT [DF_CM_PROJECTS_ATTACHMENT] DEFAULT (0x0151...D2CC72D) FOR [CM_DOCUMENTATION]
This seems to work in SQL Server. More specifically, when I add a record to CM_PROJECTS, the CM_DOCUMENTATION column has this blob in there when I query the table in SSMS. However in MS Access, when I try to see the contents of this field, I get nothing. The field is blank in MS Access (I tried to refresh).
I also tried what was mentioned in this thread, but this didn't play nice with the ALTER command (as above). I kept getting syntax errors.
Any ideas?

How to add a custom dictionary to Microsoft SQL Full Text Search?

I am struggling with how to get Microsoft SQL Full Text Search to search against words that have word breakers in them such as A-123, AB-123, or ABC-123. The out of the box English word breaker wants to split these words at the dash. The words with dashes in them are a known set. I came across this article which discusses a possible solution, but I can't seem to get it to work. I am running SQL 2014 Enterprise Edition with SP 1. I created a text file with the following contents:
A-123
AB-123
ABC-123
I then restarted the full text service using exec sp_fulltext_service 'restart_all_fdhosts'.
I then tested to see if the solution worked by executing select [display_term],* from sys.dm_fts_parser('ABC-123', 1033,0,0). If working properly I would expect this to return 1 row (exact match for abc-123), but it is still returning 4 rows (abc-123, abc, 123, nn123)
The previous article mentions files that have to be copied and settings that have to be changed. My windows 10 workstation only had NlsData0009.dll and NlsLexicons0009.dll (which I did copy to C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn). NlsGrammars0009.dlll was not on my workstation. Feels like these instructions are too specific to SQL 2008.
Assuming I can get the custom dictionary to work I will then need to figure out how to apply a different custom dictionary to each database. There is sql servers with multiple databases where each database would need its own copy of a custom dictionary.
The Custom Dictionary file you create needs to be named "Custom0009.lex" for English, and placed in the following directory, where "C:\Program Files" is the install path of your SQL Instance:
C:\Program Files\Microsoft SQL Server\<instance>\MSSQL\Binn
It is important to note the dictionary file you create HAS to be Unicode-encoded, SQL Server will simply ignore it if not.
For other languages you need to change the "0009" part of the file name according to the Language hexadecimal code in Table 2 of this article: Create a custom dictionary
Unfortunately the Custom Dictionary file is universal across the entire SQL instance so won't be possible to have a different one per database. The only solution here would be to create an instance for each database.

SQL Server 2008 R2 (64-bits) - Service Pack 2 - UC5 can't find Adobe's iFilter 11.0

I am getting the following error when rebuilding a catalog containing a table with a stored PDF (it does work for Word documents).
Warning: No appropriate filter was found during full-text index
population for table or indexed view '[Test].[dbo].[Table_1]' (table
or indexed view ID '2105058535', database ID '6'), full-text key value
'911'. Some columns of the row were not indexed.
I followed the installation procedures from Adobe and ran the following commands:
EXEC sp_fulltext_service #action='load_os_resources', #value=1; -- update os resources
EXEC sp_fulltext_service 'verify_signature', 0 -- don't verify signatures
EXEC sp_fulltext_service 'update_languages'; -- update language list
EXEC sp_fulltext_service 'restart_all_fdhosts'; -- restart daemon
EXEC sp_help_fulltext_system_components 'filter'; -- view active filters
The last does return the correct filter path:
filter .pdf E8978DA6-047F-4E3D-9C78-CDBE46041603 C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll 11.0.1.36 Adobe Systems, Inc.
I have added the path to C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\ and verified that it works.
I have re-started the services (even rebooted the machine). I also I verified by running the filtdump.exe that comes with the Windows SDK to verify that the filter does work OUTSIDE of SQL Server 2008 R2.
Also I have re-configured the sql server services so they run with an admin account (in case the problem is related to permissions).
Lastly, I have tried on several machines (some running with the SP1) and same result. No problems registering the DLL... SQL Server simply does not call the filter. Note that I have tried uploading a document to the same table with an "unknown" extension (e.g. ".xyz") and I get the same result... It is like if ".pdf" was an extension not registered (however it is).
Any suggestion?
I ran into all kinds of weird problems when trying to solve this issue. The solution was to grant adobe iFilter dll bin directory full access to the SQL Server database engine service account.
My tests where done with Adobe 9.0. We tried first Adobe Pdf 11 filter with no luck, reinstalled Adobe Pdf 9, no luck, reinstalled SQL Server 2008r2 + sp2 + Adobe Pdf no luck.
I rebuilt catalogs, not working. Finally as you commented about possibility that SQL Server was simply not calling the Dll, I suspect an access permission problem. By granting above permissions it worked.
Maybe this would make Adobe PDF 11 work, but beware of correcting the path for version 11.

How can I use a SQL Scripts in a Database Project with the System.Data.SQLite data provider?

I've got a project where I'm attempting to use SQLite via System.Data.SQLite. In my attempts to keep the database under version-control, I went ahead and created a Database Project in my VS2008. Sounds fine, right?
I created my first table create script and tried to run it using right-click->Run on the script and I get this error message:
This operation is not supported for the provider or data source you are using.
Does anyone know if there's an automatic way to use scripts that are part of database project against SQLite databases referenced by the databases, using the provider supplied by the System.Data.SQLite install?
I've tried every variation I can think of in an attempt to get the script to run using the default Run or Run On... commands. Here's the script in it's most verbose and probably incorrect form:
USE Characters
GO
IF EXISTS (SELECT * FROM sysobjects WHERE type = 'U' AND name = 'Skills')
BEGIN
DROP Table Skills
END
GO
CREATE TABLE Skills
(
SkillID INTEGER PRIMARY KEY AUTOINCREMENT,
SkillName TEXT,
Description TEXT
)
GO
Please note, this is my first attempt at using a Database, and also the first time I've ever touched SQLite. In my attempts to get it to run, I've stripped any and everything out except for the CREATE TABLE command.
UPDATE: Ok, so as Robert Harvey points out below, this looks like an SQL Server stored procedure. I went into the Server Explorer and used my connection (from the Database project) to get do what he suggested regarding creating a table. I can generate SQL from to create the table and it comes out like thus:
CREATE TABLE [Skills] (
[SkillID] integer PRIMARY KEY NOT NULL,
[SkillName] text NOT NULL,
[Description] text NOT NULL
);
I can easily copy this and add it to the project (or add it to another project that handles the rest of my data-access), but is there anyway to automate this on build? I suppose, since SQLite is a single-file in this case that I could also keep the built database under version-control as well.
Thoughts? Best practices for this instance?
UPDATE: I'm thinking that, since I plan on using Fluent NHibernate, I may just use it's auto-persistence model to keep my database up-to-snuff and effectively in source control. Thoughts? Pitfalls? I think I'll have to keep initial population inserts in source-control separately, but it should work.
I built my database using an SQLite SQL script and then fed that into the sqlite3.exe console program like this.
c:\sqlite3.exe mydatabase.db < FileContainingSQLiteSQLCommands
John
Well, your script looks like a SQL Server stored procedure. SQLite most likely doesn't support this, because
It doesn't support stored procedures, and
It doesn't understand SQL Server T-SQL
SQL is actually a pseudo-standard. It differs between vendors and sometimes even between different versions of a product within the same vendor.
That said, I don't see any reason why you can't run any (SQLite compatible) SQL statement against the SQLite database by opening up connection and command objects, just like you would with SQL Server.
Since, however, you are new to databases and SQLite, here is how you should start. I assume you already have SQLite installed
Create a new Windows Application in Visual Studio 2008. The database application will be of no use to you.
Open the Server Explorer by pulling down the View menu and selecting Server Explorer.
Create a new connection by right-clicking on the Data Connections node in Server Explorer and clicking on Add New Connection...
Click the Change button
Select the SQLite provider
Give your database a file name.
Click OK.
A new Data Connection should appear in the Server Explorer. You can create your first table by right-clicking on the Tables node and selecting Add New Table.

How to version control SQL Server databases?

I have SQL Server databases and do changes in them. Some database tables have records that are starting records required my app to run. I would like to do version control over database and these records (rows). Is it possible to do this and bundle it to SVN version control I have for my source code or are there other solutions to this? I would like to accomplish this to be able to return to previous version of database and compare changes between database revisions. It would be nice if tools for this are free, open source or not very expensive.
My environment is Visual C# Express, SQL Server 2008 Express and Tortoise SVN.
Late answer but hopefully useful to other readers
I can suggest using the SSMS add-in called ApexSQL Source Control. By utilizing this add-in, developers can easily map database objects with the source control system via the wizard directly from SSMS. It includes support for Git, TFS, Mercurial, Subversion, TFS (including Visual Studio Online) and other Source Control systems. It also includes support for source controlling Static data (so you can version control records also).
After downloading and installing ApexSQL Source Control, simply right-click the database you want to version control and navigate to ApexSQL Source Control sub-menu in SSMS. Click the “Link database to source control” option and select the source control system and the database development model, for example:
After that, you may exclude objects you don’t want to be linked to source control. It is possible to exclude specific objects by owner or type.
On the next step, you will be prompted to provide the log-in information for the source control management system:
Once done, just click the “Finish” button and the “Action center” window will be shown, offering the objects that will be committed to the repository (this is by default, if the repository is empty).
Once the database has been linked to source control, all the operations that can be executed from a source control client will be available from the “Object Explorer” pane. Those include:
checking out with or without lock the versioned objects,
view history of that object and apply specific revision,
view changes on that object that were made and
place data from table to source control using the “Link static data”
You can read this article for more information: http://solutioncenter.apexsql.com/sql-source-control-reduce-database-development-time/
We've just started doing the following on some of our projects, and it seems to work quite well, for populating "static" tables.
Our scripts follow a pattern where a temp table is constructed, and is then populated with what we want the real table to resemble. We only put human readable values here (i.e. we don't include IDENTITY/GUID columns). The remainder of the script takes the temp table and performs appropriate INSERT/UPDATE/DELETE statements to make the real table resemble the temp table. When we have to change this "static" data, all we have to update is the population of the temp table. This means that DIFFing between versions works as expected, and rollback scripts are as simple as getting a previous version from source control.
The INSERT/UPDATE/DELETEs only have to be written once. In fact, our scripts are slightly more complicated, and have two sets of validation run before the actual DML statements. One set validate the temp table data (i.e. that we're not going to violate any constraints by attempting to make the database resemble the temp table). The other validate the temp table and the target database (i.e. that foreign keys are available).
Static data support is being added to SQL Source Control 2.0, currently available in beta. More information on how to try this can be found here:
http://www.red-gate.com/messageboard/viewtopic.php?t=12298
There is a free microsoft product called Database Publishing Wizard which you can use to script the entire database (schema and data). It is great for taking snapshots of the current state of a DB and will enable you to recreate from scratch at any point
For database (schema) versioning we use custom properties, which are added to the database when the installer is ran. The contents of these scripts is generated with our build scripts.
The script to set the properties looks like this:
DECLARE #AssemblyDescription sysname
SET #AssemblyDescription = N'DailyBuild_20090322.1'
DECLARE #AssemblyFileVersion sysname
SET #AssemblyFileVersion = N'0.9.3368.58294'
-- The extended properties DatabaseDescription and DatabaseFileVersion contain the
-- AssemblyDescription and AssemblyFileVersion of the build that was used for the
-- database script that creates the database structure.
--
-- The current value of these properties can be displayed with the following query:
-- SELECT * FROM sys.extended_properties
IF EXISTS (SELECT * FROM sys.extended_properties WHERE class_desc = 'DATABASE' AND name = N'DatabaseDescription')
BEGIN
EXEC sys.sp_updateextendedproperty #name = N'DatabaseDescription', #value = #AssemblyDescription
END
ELSE
BEGIN
EXEC sys.sp_addextendedproperty #name = N'DatabaseDescription', #value = #AssemblyDescription
END
IF EXISTS (SELECT * FROM sys.extended_properties WHERE class_desc = 'DATABASE' AND name = N'DatabaseFileVersion')
BEGIN
EXEC sys.sp_updateextendedproperty #name = N'DatabaseFileVersion', #value = #AssemblyFileVersion
END
ELSE
BEGIN
EXEC sys.sp_addextendedproperty #name = N'DatabaseFileVersion', #value = #AssemblyFileVersion
END
GO
You can get a version of SQL Management Studio for SQL Server Express. I believe you'll be able to use this to produce scripts of the schema of your database. I think that will leave you to create scripts by hand for inserting the starting records.
Then, put all the scripts into source control, along with a master script that runs the individual scripts in the correct order.
You'll be able to run diffs using windiff (free with Visual Studio SDK), or else Beyond Compare is inexpensive, and a great diff/merge/sync tool.
MS Visual Studio Team System for Database Developers has functionality to easily generate create scripts for the whole schema. Only drawback is the cost!
Have you considered using SubSonic?
You should rather use DB specific versioning.
http://msdn.microsoft.com/en-us/library/ms189050.aspx
When either the
READ_COMMITTED_SNAPSHOT or
ALLOW_SNAPSHOT_ISOLATION database
options are ON, logical copies
(versions) are maintained for all data
modifications performed in the
database. Every time a row is modified
by a specific transaction, the
instance of the Database Engine stores
a version of the previously committed
image of the row in tempdb. Each
version is marked with the transaction
sequence number of the transaction
that made the change. The versions of
modified rows are chained using a link
list. The newest row value is always
stored in the current database and
chained to the versioned rows stored
in tempdb.
I use bcp for this (bulk loading utility, part of a standard SQL Server install, Express edition included).
Each table with data needs a control file Table.ctl and a data file Table.csv (these are text files that can be generated from an existing database using bcp). As text files, these can very easily be versioned.
As part of my generation batches (see my answer there for more information), I iterate through every control file like this :
SET BASE_NAME=MyDatabaseName
SET CONNECT_STRING=.\SQLEXPRESS
FOR /R %%i IN (.) DO (
FOR %%j IN ("%%~fi\*.ctl") DO (
ECHO + %%~nj
bcp %BASE_NAME%..%%~nj in "%%~dpsj%%~nj.csv" -T -E -S %CONNECT_STRING% -f "%%~dpsj%%~nj.ctl" >"%TMP%\%%~nj.log"
IF %ERRORLEVEL% GTR 0 (
TYPE "%TMP%\%%~nj.log"
GOTO ERROR_USAGE
)
)
)
A current limitation of this script is that the name of the file must be the name of the table, which may not be possible if the table name contains specific special characters.
This project has a good example of deploy and rollback

Resources