I need to implement a service to search PDFs. Initially I started using SQL Server 2008 FTS, but soon realized that my PDFs would have to be stored in the DB itself. I was then pointed to Indexing Services as well as to the SQL 2008 FILESTREAM data type so that I can store PDFs in the file system. So how do these three (Indexing Services, FTS, and the FILESTREAM option) relate with each other? Do I need to use all three together to implement my search?
Also, Do hosting services like DiscountASP typically have these enabled? Or should I consider switching to Lucene.NET?
WE used to use a PDF iFilter which allows you to store the PDF in the DB and then perform a FTS against it. HOwever, we now convert our PDFs to text and store the text in the full text index. This allows us to store all our docs now (we store .doc, .pdf etc) in the same index.
DiscountASP does allow FTS /iFTS on the hosted database.
If you know in advance what you want to find (eg you get hundreds of PDFs a day and will need to find the ones with certain "known-before-reception" strings then you could make a text version on reception, create index entries for the PDF file, and then throw away the text.
If you do not know the search terms in advance, life becomes much slower :( There is a program called PDF Search that claims to do full-text search in PDF files. I haven't needed to use it, so I can't say how it is, but it's here: http://www.getpdf.com/.
Hope this helps
Related
I would like to collect my orders in a very simple relational database with some PDFs inside binary a field. I would like to easily browse the PDFs without building a frontend. There are many SQL admin tools available which are perfect for browsing the databases, but I have not found any free tools which could display the PDFs inside it or open them easily.
Which existing free and platform independent tool is able to show PDFs in a database?
For example:
SqlLite Studio has a "Edit value" window, which can display the stored image.
SQLMastro is able to do it, but is not free.
Images can also be shown in LibreOffice Base and Microsoft Access.
In MS Access VBA it is possible to embed ActiveX applications or open a PDF stored at a path, but MS is not platform independent and VBA is very messy.
I would also be happy, if it would actually store a copy of the file temporary on the harddrive and automatically open it. I just want a single click solution.
I did not specify the actual database format, as I am happy to use any relation database which has such a tool already. I want to use some database instead of a magical folder structure.
I have SQL Server 2012 and started looking into Filestream as a way to link "attachments" (> 1 GB) such as Excel documents and PDF files to database table records. While I have been successful in finding the "hello world" T-SQL examples that allow me to do some rudimentary tasks (enable Filestream, create table with Filestream column, insert row, etc.) I also encounter "beware" statements.
Is Filestream really as temperamental and full of "gotchas" as I the variously placed forum writings suggest, or is it straight forward and with a least predictable quriks?
Thank you for any insight
I would recommend using filetable which is implemented using filestream and its really neat enhancement, it preserves functionality of filestream but in contrast to filestream it can be configured to also allow accessing files outside db engine. For example, you can allow IIS to use those files directly. In production in big telecom company from 2013, flawless!
P.S. this is maybe more suitable for comment but I don't have reputation to write it :)
I am writing an asp.net web application that stores APPLICANTS data in a SQL Server database.
Applicant might post name, address, telephone and a file.
The file might be of any extension including .docx for resume, 'jpg, .pdf for photos.
or even an Excel file.
Is it possible to store all these file extension on my database?
Or will that be lengthy?
Please help
Good question! Personally I would use FILESTREAM in your case and here's why
In SQL Server, BLOBs can be standard varbinary(max) data that stores
the data in tables, or FILESTREAM varbinary(max) objects that store
the data in the file system. The size and use of the data determines
whether you should use database storage or file system storage. If the
following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs
in the database often provides better streaming performance.
You can read up on FILESTREAM here.
Also consider using it in conjunction with FILETABLE.
Finally, here's a .net C# example on how to read from FILESTREAM column.
Please note, FILESTREAM is available in SQL Server starting from 2008 version.
Hope it helps!
I am using T-SQL and Microsoft Management Studio 2008 R2. I want to create a database in which I can store video files.
After google search and some reading I have learned that there is a option to use "File Stream Enable Database". It was said that this kind of database should be used only when your files are larger then 2MB. I want to store video files, so I think this is suitable for my goals.
Please, give me more information about the main difference in using BLOB and FileStream Enable database or just to store the files in a given directory and to save only the url in the database table column?
Thanks in advance.
Filestream was an interesting change when it came in for me; the bit that suprised me was Full Text Search was taken out of the operating system because it caused issues; but file stream put it back because Blobs caused issues.
Using Filestream is basically transparent to your application and it even backs the files up as if they were in the database - and thats the big benefit or cost over the save in database v save pointer in database.
You can insert files the same way as you did before and you can read them back in SQL in exactly the same way. The difference and benefit is that that SQL can take advantage of Windows system cache for reading and files saving its own resources to make other queries run quicker.
Please, give me more information about the main difference in using BLOB and FileStream Enable
database
The feature you call for is "FileStream" not "FileStream enable".
Some blogs are also around, like http://blogs.msdn.com/b/rdoherty/archive/2007/10/12/getting-traction-with-sql-server-2008-filestream.aspx
At kleast try reading the documentation before running around and have other people do your basic groundwork.
We have stored all media in Sql Filestream, but now we'll need Video and Audio streaming... Will this be possible with Sql Filestream or will I have to take all of the Video and Audio out of the database?
Which technology would you use to enable Video/Audio Streaming?
WebORB
FluorineFX
Wowza (way better I think than the first two)
IIS Media (haven't looked into this yet)
When using IIS Media its not possible to store the data in a SQL Fielstream.
For further details check here.
It's possibly very similar with the rest of your suggested solutions, since all of them need to re encode the material to enable streaming (if its not in the necessary format already).
You actually have 2 problems:
Re encoding the videos into a format
that enables you to stream it via
the server platform you choose, just
for this part you need to extract
the files from the db since the
encoding tools can't be fed from a
database, even if its a SQL FileStream
Store the encoded files
somewhere the media servers can
access them, again they don't allow
a SQL Server as a data soure, they
probably have their own storing
infrastructure or use the file
system.
Conclusion:
The FileStream is extremely helpful when you have full control over server/client, but sadly not in your case.
You will probably have to extract all files from the DB.
The FileTable feature in SQL Server "Denali" (not yet released) is designed specifically for this scenario (amongst others).
There's a good overview link here: Using FileTables to Manage Unstructured FILESTREAM Data.
This will allow you to directly access and play these files through a provided UNC path without requiring any changes to the application, so you can use any of the above mentioned streaming servers.