Solr with SQL Server - sql-server

I am working on a POC of using Solr with SQL Server.
I have a very complex data model with SQL server which required a bunch of joining and scalar functions to strip some mark up and other stuff.
This is turning out to be performance bottleneck. To address this issue, we have considered NoSQL (MogoDB) or Solr with SQL Server as our options.
Using MongoDB, we attach a replication events for all CRUD operations which will carry over the data to MongoDB as well after successful insert, update, delete on SQL Server. And when we have to perform a search we do the search directly on Mongo Collections.
This sounds very cool as we have 32 tables joining in this search, which can convert to 2 collections in MongoDB
On the other note we are also exploring our options using Solr with SQL Server with DataImport
My concern is, based on this article http://www.codewrecks.com/blog/index.php/2013/04/29/loading-data-from-sql-server-to-solr-with-a-data-import-handler/ I have to do a import for each entity
How does the joining search works with Solr? Should I import each table from SQL to Solr and then write a join login against Solr APIs
Can I Import Multiple Entities at once? Should I create a view for expected result set (denormalized) and import that view to Solr?
Will these imports have to be done on regular intervals? After a import
if there is new data, does Solr API reflects that change? or I have
to do a import first and then do the search against Solr API
Finally can I compare Solr with MongoDB, if anyone has done this kind of evaluation please share your thoughts.

Related

How to push data from a on-premises database to tableau crm

We have an on-premises oracle database installed on a server. We have to create some Charts/Dashboards with Tableau CRM on those data on-premises. Note that, tableau CRM is not Tableau Online, it is a Tableau version for the Salesforce ecosystem.
Tableau CRM has APIs, so we can push data to it or can upload CSV programmatically to it.
So, what can be done are,
Run a nodeJS app on the on-premise server, pull data from Oracle DB, and then push to Tableau CRM via the TCRM API.
Run a nodeJS app on the on-premise server, pull data from Oracle DB, create CSV, push the CSV via TCRM API
I have tested with the 2nd option and it is working fine.
But, you all know, it is not efficient. Because I have to run a cronJob and schedule the process multiple times in a day. I have to query the full table all the time.
I am looking for a better approach. Any other tools/technology you know to have a smooth sync process?
Thanks
The second method you described in the questions is a good solution. However, you can optimize it a bit.
I have to query the full table all the time.
This is can be avoided. If you take a look at the documentation of SObject InsightsExternalData you can see that it has a field by name Operation which takes one of these values Append, Delete, Overwrite, Upsert
what you will have to do is when you push data to Tableau CRM you can use the Append operator and push the records that don't exist in TCRM. That way you only query the delta records from your database. This reduces the size of the CSV you will have to push and since the size is less it takes less time to get uploaded into TCRM.
However, to implement this solution you need two things on the database side.
A unique identifier that uniquely identifies every record in the database
A DateTime field
Once you have these two, you have to write a query that sorts all the records in ascending order of the DateTime field and take only the files that fall below the last UniqueId you pushed into TCRM. That way your result set only contains delta records that you don't have on TCRM. After that you can use the same pipeline you built to push data.

Azure Search - Creating a dedicated search table in SQL, for using with an Indexer

I'm building an Assets search engine.
The data I need to be indexed for each assets is scattered into multiples tables in the SQL database.
Also, there is many events in the application that will trigger update to the asset indexed fields (in Draft, Rejected, Produced, ...).
I'm considering creating a new denormalized table in the SQL database that would exist solely for the Azure Search Index.
It would be an exact copy of the Azure Search Index fields.
The application would be responsible to fill and update the SQL table, through various event handlers.
I could then use an Azure SQL Indexer schedule to automatically import the data into the Azure Search Index.
PROS:
We are used to deal with sql table operations, so the application code remains standard, no need to learn the Azure Search API
Both the transactional and the search model are updated in the same SQL transaction (atomic). Then the Indexer update the index in an eventual consistent manner, and handle the retry logic.
Built-in support for change detection with SQL Integrated Change Tracking Policy
CONS:
Indexed data space usage is duplicated in the SQL database
Delayed Index update (minimum 5 minutes)
Do you see any other pros and cons ?
[EDIT 2017-01-26]
There is another big PRO for our usage.
During development, we regularly add/rename/remove fields from the Azure index. In its current state, some schema modifications to an Azure Index are not possible, we have to drop and re-create the index.
With a dedicated table containing the data, we simply issue a Reset to our indexer endpoint and the new index gets automatically re-populated.

Does SQLAlchemy support Query Notifications of SQL Server?

Does SQLAlchemy support Query Notifications of SQL Server?
If not, what is the closest one could get?
I can imagine I could probably get SQLAlchemy directly submit Trasact SQL queries to set up the query notifications, but is there any more generic way?
In the end what I want is to be informed of any changes to the database that affects my query. This is for a FLASK web server that should then in turn push the updates to the clients showing some calculated information.

load data to elastic search from sql server

I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.
First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun

Elastic Search SQL Server

We are trying to implement free text search using elastic search.
The Plan is to use
OurApplication
|
NEST
|
Elastic Search
|
?????????????
|
SQL Server
The database is SQL Server and Create, Update and Delete operations on tables are performed by mutiple applications.
How can I populate and refresh indexes in elastic search ?
I went through River JDBC route, people are saying that it will be depreciated in the further releases ?
If I use River JDBC how can I refresh the indexes when updates happen ?
In simple what you can do is when any operation performed like update, add or delete then call respective function of elastic search. Like when a record updated then call a function which updates that record with index and type in the elastic data. Same thing with write and delete operation. I used same thing for auto suggestion and text free search in my term project.
Regards
Best Solution For that JDBC importer
The Java Database Connection (JDBC) importer allows to fetch data from JDBC sources for indexing into Elasticsearch.
The JDBC importer was designed for tabular data. If you have tables with many joins, the JDBC importer is limited in the way to reconstruct deeply nested objects to JSON and process object semantics like object identity. Though it would be possible to extend the JDBC importer with a mapping feature where all the object properties could be specified, the current solution is focused on rather simple tabular data streams.
You can use script file for jdbc database connection and for query.you store the data in index and used this index on application.
You can refresh the index by Setting a cron expression in the parameter schedule enables repeated (or time scheduled).
Example of a schedule parameter:
"schedule" : "0 0-59 0-23 ? * *"
This executes JDBC importer every minute, every hour, all the days in the week/month/year. and index get fresh data every minute.
Also JDBC importer support mysql, PostgreSQL.
JDBC importer provide Many feature.
For details :https://github.com/jprante/elasticsear

Resources