load data to elastic search from sql server - sql-server

I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.

First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun

Related

Compare data between Elastic Search and RDS

We are moving our data from RDS to elastic search and the data volume is around 80GB with around 90 million records.
We have been using bulk api of elastic search for indexing the data but we want to take an entire dump of elastic records and compare the records with our RDS as we want to make sure that the data from RDS is moved correctly to Elastic. In Elastic, we have been combining multiple tables of RDS into a single elastic index, is there any way to take the dump of elastic index into a document or into multiple files?
There's no solution out of the box but there's an old post from the elastic website which explain some solutions:
https://www.elastic.co/blog/elasticsearch-verifying-data-integrity-with-external-data-stores
Hope it help.

Indexing text documents by Azure Search Service

Azure's documentation suggests that we should leverage blobs to be able to index documents like MS Word, PDF, etc. We have an Azure SQL Server database of thousands of documents stored in a table's nvarchar(MAX) field. The nature of the contents in each database record is in plain English text. In fact the application converted the PDF / MS Word into plain text and stored in database.
My question is that would it be possible to index the stored "documents" in database in the same way as Azure would do against blobs? I know how to create an SQL Azure indexer but I'd like to make sure that the way that the underneath search performs against blobs will be the same for documents stored in database table.
Thanks in advance!
This is not currently possible - document extraction can only be done on blobs stored in Azure storage.

Solr with SQL Server

I am working on a POC of using Solr with SQL Server.
I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff.
This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoDB) or Solr with SQL Server as our options.
Using MongoDB, we attach a replication events for all CRUD operations which will carry over the data to MongoDB as well after successful insert, update, delete on SQL Server. And when we have to perform a search we do the search directly on Mongo Collections.
This sounds very cool as we have 32 tables joining in this search, which can convert to 2 collections in MongoDB
On the other note we are also exploring our options using Solr with SQL Server with DataImport
My concern is, based on this article http://www.codewrecks.com/blog/index.php/2013/04/29/loading-data-from-sql-server-to-solr-with-a-data-import-handler/ I have to do a import for each entity
How does the joining search works with Solr? Should I import each table from SQL to Solr and then write a join login against Solr APIs
Can I Import Multiple Entities at once? Should I create a view for expected result set (denormalized) and import that view to Solr?
Will these imports have to be done on regular intervals? After a import
if there is new data, does Solr API reflects that change? or I have
to do a import first and then do the search against Solr API
Finally can I compare Solr with MongoDB, if anyone has done this kind of evaluation please share your thoughts.

Elastic Search SQL Server

We are trying to implement free text search using elastic search.
The Plan is to use
OurApplication
|
NEST
|
Elastic Search
|
?????????????
|
SQL Server
The database is SQL Server and Create, Update and Delete operations on tables are performed by mutiple applications.
How can I populate and refresh indexes in elastic search ?
I went through River JDBC route, people are saying that it will be depreciated in the further releases ?
If I use River JDBC how can I refresh the indexes when updates happen ?
In simple what you can do is when any operation performed like update, add or delete then call respective function of elastic search. Like when a record updated then call a function which updates that record with index and type in the elastic data. Same thing with write and delete operation. I used same thing for auto suggestion and text free search in my term project.
Regards
Best Solution For that JDBC importer
The Java Database Connection (JDBC) importer allows to fetch data from JDBC sources for indexing into Elasticsearch.
The JDBC importer was designed for tabular data. If you have tables with many joins, the JDBC importer is limited in the way to reconstruct deeply nested objects to JSON and process object semantics like object identity. Though it would be possible to extend the JDBC importer with a mapping feature where all the object properties could be specified, the current solution is focused on rather simple tabular data streams.
You can use script file for jdbc database connection and for query.you store the data in index and used this index on application.
You can refresh the index by Setting a cron expression in the parameter schedule enables repeated (or time scheduled).
Example of a schedule parameter:
"schedule" : "0 0-59 0-23 ? * *"
This executes JDBC importer every minute, every hour, all the days in the week/month/year. and index get fresh data every minute.
Also JDBC importer support mysql, PostgreSQL.
JDBC importer provide Many feature.
For details :https://github.com/jprante/elasticsear

DB technology for efficient search in tabular data?

We have a repository of tables. Around 200 tables, each table can be thousands of rows, all tables are originally in Excel sheets.
Each table has a different scheme. All data is text or numbers.
We would like to create an application that allows free text search on all tables (we define which columns will be searched in each table) efficiently - speed is important.
The main dilemma is which DB technology we should choose.
We created a mock up by importing all tables to MS SQL Server, and creating a full text index over them. The search is done using the CONTAINS keyword. This solution works well for a small number of tables, but it doesn't scale.
We thought about a NoSQL solution, but we don't yet have any experience in it.
Our limitations (which unfortunately I can not effect): Windows servers only. But we can install on them whatever we want.
Thank you.
Check out ElasticSearch! It's a search server based on Apache Lucene and has a clean REST- and JavaScript-based API. Although it's used usually as a search-index for a primary database, it can also be used stand-alone. So you may want to write a backup routine for a few of your tables/data and try it out.
http://www.elasticsearch.org/
http://en.wikipedia.org/wiki/ElasticSearch
Comparison of ElasticSearch and Apache Solr (another Lucene-based search server):
https://docs.google.com/present/view?id=dc6zhtt5_1frfxwfff&pli=1

Resources