Azure Search: Indexer from Cosmos DB DataSource with default values - azure-cognitive-search

I have multiple indexers on an index. In the index, I have the fields "sourceDisplayName" and "category".
Two of my indexers are connected to one datasource each, which are both a cosmos db container. One is called "Articles" and one is called "Events". Works fine so far.
Now I want the fields "sourceDisplayName" and "category" in the datasource "Articles" to be always "My Site" and "Article", and for the datasource "Events" to be always "My Site" and "Event".
Is there any way to accomplish this or do I need to insert the values into each cosmos db document using a trigger or stored procedure?

This can probably be done using a skillset to enrich the documents read from Cosmos DB before they are added to the index. However, an even simpler approach for your case could be to add the field to your query in the datasource definition, as described in this answer. HTH!

Related

Indexing EAV model using Solr

The database I have at hand uses the EAV model describing all objects one can find in a house. Good or bad isn't the question, there is no choice but to keep and use this model. 6.000+ items point to 3.000+ attributes and 150.000+ attribute-values.
My task is to get this data into a Solr index for quick searching/sorting/faceting.
In Solr, using DIH, a regular SQL query is used to extract data. Each column name returned from the query is a 'field' (defined or not in a schema), and each row of the query's resultset is a 'document'.
Because the EAV model uses rows for attributes instead of columns, a simple query will not work, I need to flatten each item row. What should my SQL query look like in order to extract all items from the DB ? Is there a special Solr/DIH configuration which I should consider ?
There are some similar questions on SO, but none really helped.
Any pointers are much appreciated!

Deleting documents from cloudant db

I need to delete specific documents from a cloudant database as illustrated below.
All documents are in the Json format:
{
"_id":"abcd",
"_rev":"1_efgh",
"moduleName":"co",
"userid":"knight",
"stateType":"yout"
}
I want to delete all the documents for which the "userid" field does not match any of the values in a list of String values,
say
["userid1", "userid2"]
I am using com.cloudant.client.api.CloudantClient in Springboot.
You can create a view to get the documents matching your criteria. Then you should perform a bulk operation to delete the docs you've fetched before.
See also related question: How to delete docs in Couch db

SQL Server - auto-populate field in one table with value from another table in another Database

I have two Azure SQL Server databases with the following as example:
Database Name: DataProp
Table Name: DataImports
Columns: SearchID, SourceID, Text, Status, Country
Database Name: Sources
Table Name: SourceInformation
Columns: SourceID, SourceTitle, Country
Right now, the Country column in the DataProp database is all NULL. I need to auto-populate the Country field in DataProp with the values of the Country fields in the Sources database. The common field between the two tables is SourceID. I need to do this for all existing data, as well as have it occur for future records.
What is the best way to accomplish this? A stored procedure that's set to run on a schedule? If so, I would appreciate guidance on the T-SQL syntax.
As a side-note, I looked at the possibility of a computed column, but this will not work for us b/c we maintain an Azure Search Index on our tables, and Azure Search can't index computed columns.
I don't think you'll be able to directly write a join between tables in two different DBs. We had a similar problem and decided to move all tables into a single DB in separate schemas. I think in your case you can write a Webjob to pull in data from one table and update the second table. I also found one article related to this but haven't personally tried, so not sure if it works.
https://ppolyzos.com/2016/07/30/cross-database-queries-in-azure-sql-databases/

Database mapping to Solr

I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration

Can I use one entity for mulitple datasources in Solr?

I want to import mutiple sharding databases into one entity in Solr. The sharding databases have the same scheme.
Is it possible?
Thanks.
Sure, it should possible. Are you using DataImportHandler (the question does not say)?
Have you already done a DIH import of a single database and want to just have that definition apply to multiple sources?
If so, you have two basic options to go forward:
Copy the entity definition with appropriately changed sources. When you run DIH, it will execute first entity, then the other
Create an outer entity with rootEntity = false and with your database entity inside of it. Then, your outer entity needs to generate some sort of variable with each round corresponding to different shard. Your inner entity will use that variable to connect to the correct shard and execute the load. You could , for example, have an XML file with your dataSource names for the outer entity to parse it with XPathEntityProcessor

Resources