Elastic Search SQL Server - sql-server

We are trying to implement free text search using elastic search.
The Plan is to use
OurApplication
|
NEST
|
Elastic Search
|
?????????????
|
SQL Server
The database is SQL Server and Create, Update and Delete operations on tables are performed by mutiple applications.
How can I populate and refresh indexes in elastic search ?
I went through River JDBC route, people are saying that it will be depreciated in the further releases ?
If I use River JDBC how can I refresh the indexes when updates happen ?

In simple what you can do is when any operation performed like update, add or delete then call respective function of elastic search. Like when a record updated then call a function which updates that record with index and type in the elastic data. Same thing with write and delete operation. I used same thing for auto suggestion and text free search in my term project.
Regards

Best Solution For that JDBC importer
The Java Database Connection (JDBC) importer allows to fetch data from JDBC sources for indexing into Elasticsearch.
The JDBC importer was designed for tabular data. If you have tables with many joins, the JDBC importer is limited in the way to reconstruct deeply nested objects to JSON and process object semantics like object identity. Though it would be possible to extend the JDBC importer with a mapping feature where all the object properties could be specified, the current solution is focused on rather simple tabular data streams.
You can use script file for jdbc database connection and for query.you store the data in index and used this index on application.
You can refresh the index by Setting a cron expression in the parameter schedule enables repeated (or time scheduled).
Example of a schedule parameter:
"schedule" : "0 0-59 0-23 ? * *"
This executes JDBC importer every minute, every hour, all the days in the week/month/year. and index get fresh data every minute.
Also JDBC importer support mysql, PostgreSQL.
JDBC importer provide Many feature.
For details :https://github.com/jprante/elasticsear

Related

How to push data from a on-premises database to tableau crm

We have an on-premises oracle database installed on a server. We have to create some Charts/Dashboards with Tableau CRM on those data on-premises. Note that, tableau CRM is not Tableau Online, it is a Tableau version for the Salesforce ecosystem.
Tableau CRM has APIs, so we can push data to it or can upload CSV programmatically to it.
So, what can be done are,
Run a nodeJS app on the on-premise server, pull data from Oracle DB, and then push to Tableau CRM via the TCRM API.
Run a nodeJS app on the on-premise server, pull data from Oracle DB, create CSV, push the CSV via TCRM API
I have tested with the 2nd option and it is working fine.
But, you all know, it is not efficient. Because I have to run a cronJob and schedule the process multiple times in a day. I have to query the full table all the time.
I am looking for a better approach. Any other tools/technology you know to have a smooth sync process?
Thanks
The second method you described in the questions is a good solution. However, you can optimize it a bit.
I have to query the full table all the time.
This is can be avoided. If you take a look at the documentation of SObject InsightsExternalData you can see that it has a field by name Operation which takes one of these values Append, Delete, Overwrite, Upsert
what you will have to do is when you push data to Tableau CRM you can use the Append operator and push the records that don't exist in TCRM. That way you only query the delta records from your database. This reduces the size of the CSV you will have to push and since the size is less it takes less time to get uploaded into TCRM.
However, to implement this solution you need two things on the database side.
A unique identifier that uniquely identifies every record in the database
A DateTime field
Once you have these two, you have to write a query that sorts all the records in ascending order of the DateTime field and take only the files that fall below the last UniqueId you pushed into TCRM. That way your result set only contains delta records that you don't have on TCRM. After that you can use the same pipeline you built to push data.

How to perform Lookups in Azure Data Factory?

I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )

Logstash - Only import new or updated rows using JDBC

Right now I've got Logstash importing a miniature version of my MSSQL database, using the JDBC plugin. I've got each JDBC input scheduled to run every minute to update Elasticsearch. To update, I'm currently just re-importing every single table and row in my database, and adding all rows to Elasticsearch. When I begin using the full database though, this will be very inefficient as it will take more than a minute to go through the entire database. Is there another way to keep Elasticsearch in sync with my database? I've tried to use the 'sql_last_value' parameter to only import new rows into the database, but this will only work when the 'id' for my database table is a number, and each new entry in the table has a greater number than the last. Some tables in the database have an 'id' which can be completely random (ie. "43f4-f43ef-e44454r"), which cannot be used with 'sql_last_value' as they are impossible to compare. I'm also unable modify the actual database at all, which cuts out a lot of my potential solutions. I feel as if i am out of options here, so can anyone suggest anything that I can try?

Azure Search - Creating a dedicated search table in SQL, for using with an Indexer

I'm building an Assets search engine.
The data I need to be indexed for each assets is scattered into multiples tables in the SQL database.
Also, there is many events in the application that will trigger update to the asset indexed fields (in Draft, Rejected, Produced, ...).
I'm considering creating a new denormalized table in the SQL database that would exist solely for the Azure Search Index.
It would be an exact copy of the Azure Search Index fields.
The application would be responsible to fill and update the SQL table, through various event handlers.
I could then use an Azure SQL Indexer schedule to automatically import the data into the Azure Search Index.
PROS:
We are used to deal with sql table operations, so the application code remains standard, no need to learn the Azure Search API
Both the transactional and the search model are updated in the same SQL transaction (atomic). Then the Indexer update the index in an eventual consistent manner, and handle the retry logic.
Built-in support for change detection with SQL Integrated Change Tracking Policy
CONS:
Indexed data space usage is duplicated in the SQL database
Delayed Index update (minimum 5 minutes)
Do you see any other pros and cons ?
[EDIT 2017-01-26]
There is another big PRO for our usage.
During development, we regularly add/rename/remove fields from the Azure index. In its current state, some schema modifications to an Azure Index are not possible, we have to drop and re-create the index.
With a dedicated table containing the data, we simply issue a Reset to our indexer endpoint and the new index gets automatically re-populated.

load data to elastic search from sql server

I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.
First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun

Resources