I have a number of documents that are in PDF,word and email formats how to store these documents that they can be searched by meta-data?
Non Textual things like that can be stored in an Oracle database as a CLOB or BLOB field.
What sort of metadata do you want to be able to search on?
Related
Azure's documentation suggests that we should leverage blobs to be able to index documents like MS Word, PDF, etc. We have an Azure SQL Server database of thousands of documents stored in a table's nvarchar(MAX) field. The nature of the contents in each database record is in plain English text. In fact the application converted the PDF / MS Word into plain text and stored in database.
My question is that would it be possible to index the stored "documents" in database in the same way as Azure would do against blobs? I know how to create an SQL Azure indexer but I'd like to make sure that the way that the underneath search performs against blobs will be the same for documents stored in database table.
Thanks in advance!
This is not currently possible - document extraction can only be done on blobs stored in Azure storage.
I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.
First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun
We have an application that stores Word and PDF documents in a share on a server. I'm looking into the possibility of storing these as BLOBs in the associated Microsoft SQL database instead, which seems like it's probably a good idea.
Separately, an idea which I'm investigating is the possibility of allowing users to easily view all of the documents in the share associated with a case (let's imagine they're grouped into folders by case) as one continuous stream on a tablet, as if they were all one big PDF file.
I think I've worked out how to do the latter, running a web service to convert the Word documents to PDFs and then concatenate them and the extant PDFs. But that's if we continue to store the documents as files in an NTFS share. What if we stored the documents as BLOBs in MSSQL instead?
Is there a way to concatenate BLOB data so that for every, say, 10 BLOB records (which might represent Word or PDF files), I could create an 11th record which was a concatenation of the other 10 as one giant PDF?
SQL Server Blobs are not an effective way of storing files. SQL 2008 bought about a better mechanism for this called FILESTREAM ( http://technet.microsoft.com/en-us/library/gg471497.aspx) which can store the files directly on the file system but managed by SQL.
As for the files you would not be able to simply concatenate the PDF files to form one continuous file but there are several libraries that you could use to do this, potentially on the fly. This would remove the need to store the concatenated document as well.
Is there a way to search for a specific varchar value across all fields in a SQL Server DB?
Sounds like you need full text search http://msdn.microsoft.com/en-us/library/ms142571.aspx
I'm creating news portal site. this saves to many news.Every news has html data. i'm using SQL Server 2005. I have 2 choices.
Save news data to ntext field.
Save news data to html file and save file name to nvarchar field.
What is best way to good performance and quick search operation. If i choose second way, when i search from news, i'm repeat every file and search from each.
What is best?
You have another way?
EDIT
Maybe my news count increasing over than 100,000. Now count is 1000. But SQL Server database size is 60Mb.
Use nvarchar(max), not ntext for storage. Use fulltext search for searching. Use the FILESTREAM storage if the content are documents that have to be accessed by Win32 API.
Querying varbinary(max) and xml Columns (Full-Text Search)
Best Practices for Integrated Full Text Search
SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned
Using FILESTREAM with Other SQL Server Features