Azure's documentation suggests that we should leverage blobs to be able to index documents like MS Word, PDF, etc. We have an Azure SQL Server database of thousands of documents stored in a table's nvarchar(MAX) field. The nature of the contents in each database record is in plain English text. In fact the application converted the PDF / MS Word into plain text and stored in database.
My question is that would it be possible to index the stored "documents" in database in the same way as Azure would do against blobs? I know how to create an SQL Azure indexer but I'd like to make sure that the way that the underneath search performs against blobs will be the same for documents stored in database table.
Thanks in advance!
This is not currently possible - document extraction can only be done on blobs stored in Azure storage.
Related
I am trying to get some Oracle databases to work in MSSQL AZURE.
We seem to have most things working except the ability to search on file attachments - eg word,PDF etc.
Oracle lets us index a column in a table that uses a filepath link.
In MSSQL a column in a table can be added using:
[filepointer] VARBINARY(MAX) FILESTREAM
and then an index can be setup so files can then be searched.
I'm trying to use the same oracle table with this extra column to do the search in AZURE
EG - select * from [TESTATTACHSRCH] where contains ([filepointer],'Text in File')
I managed to get this working in MSSQL with the Oracle table we had used with this extra special column.
I know at this point FILESTREAM is not supported on AZURE & dropping FILESTREAM is not an option due to the size of the files we are searching on which would add too much size to the database.
I am hoping if there was a way i could still achieve this on AZURE, even if existing AZURE cannot do this on its own & there was 3rd party software to do something similar.
Hopefully somebody has hit the same roadblock & could provide some advice
Thanks
The alternative would be moving the attachments to blobs and store them on Azure Storage. Then, you can setup an Azure Cognitive Search which can extract and index content from pdf,word, ppt, etc.
Just using SQL Server / Azure SQL Database I don't think it will work. You can find more information about what I've described in the following link:
https://learn.microsoft.com/en-us/azure/search/search-indexer-overview
https://learn.microsoft.com/en-us/azure/search/search-howto-create-indexers?tabs=indexer-portal
We will be writing several hundred thousand rows of data to an Azure Table Storage Container. The data is made up of 4 columns, 1 of which contains a lot of JSON text which is the main column I'm interested in.
How can I query this data using T-SQL? I was hoping to join this with some existing data we currently hold in a table on SQL Server too.
I am new to Azure Storage and am trying to work out if I have to query the data directly or can I get it to my SQL Server to perform some more detailed querying? It is being stored on Azure to start with due to ease and cost.
Azure Table storage does not support SQL: https://db-engines.com/en/system/Microsoft+Azure+Table+Storage%3BMicrosoft+SQL+Server
If your store your data on blob storage you might be able to use Polybase to query data from your SQL Server https://learn.microsoft.com/en-us/sql/relational-databases/polybase/polybase-guide?view=sql-server-ver15
I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.
First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun
I have a number of documents that are in PDF,word and email formats how to store these documents that they can be searched by meta-data?
Non Textual things like that can be stored in an Oracle database as a CLOB or BLOB field.
What sort of metadata do you want to be able to search on?
I'm creating news portal site. this saves to many news.Every news has html data. i'm using SQL Server 2005. I have 2 choices.
Save news data to ntext field.
Save news data to html file and save file name to nvarchar field.
What is best way to good performance and quick search operation. If i choose second way, when i search from news, i'm repeat every file and search from each.
What is best?
You have another way?
EDIT
Maybe my news count increasing over than 100,000. Now count is 1000. But SQL Server database size is 60Mb.
Use nvarchar(max), not ntext for storage. Use fulltext search for searching. Use the FILESTREAM storage if the content are documents that have to be accessed by Win32 API.
Querying varbinary(max) and xml Columns (Full-Text Search)
Best Practices for Integrated Full Text Search
SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned
Using FILESTREAM with Other SQL Server Features