I have an Azure Search Index with an Cosmos DB.
In my DB I have four documents in one Collection. Each of the document have four fields: field1, field2, field3, field4. And all of them have a value, not NULL.
In Azure portal, currently my index has three fields (field1, field2, field3) as Searchable and Retrievable. The search is working properly and I can get four records with 3 fields and values.
Now I updated my index by adding a new field as field4 with Searchable and Retrievable. Then run my indexer to update the index. I noticed that only one record has the field4 with a value while the others have field4: null
I did the same testing several times and noticed it happens randomly to some of the records. Sometimes it gives me one record with field4 has a value. Sometimes it gives me two/three records with field4 has a value.
Does anyone know this issue and what's magic behind the scene?
Indexer is not going to update the index if the underlying Cosmos DB data hasn't changed, because indexing is incremental. To trigger a from-scratch reindexing, reset your indexer - you can do that using the reset indexer API or directly in Azure portal/indexer properties blade.
The likely reason that you sometime see one or more changes "magically" picked up is that Azure Search incremental indexing logic compensates for time skew, so recently added Cosmos DB documents can be processed repeatedly.
Related
There are many of records in a solr collection. We need to update a particular column to "hello"
I have executed below json using update request handler, But it create a new record with primary key * and set its column to hello.
{
"Primary_key":"*",
"Column1":{"set":"hello"}
}
Is there any way to update a column1 in all records to hello?
There is no way to update a documents in Solr using a query like '*'.
According to me, the best way you can speed up your column update in this case is to submit multiple queries in single update request and use atomic updates.
Atomic updates allows changing only fields of a document without having to reindex the entire document.
You can send multiple update requests like,
[{"id":"1",
"column1":{"set":"hello"},
{"id":"2",
"column1":{"set":"hello"}]
There is a very old jira with this respect.
I have a table in ndb datatstore. In that I have
updated = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
created = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
With this structure I have many records in the table. Now I am updating the unindexed fields indexed=True. So, will it index all updated and created data present till date in that table or it will start indexing data to be filled in after indexing ?
And how do I index the unindexed rows of these columns ?
These properties will not be indexed on existing entities, until you rewrite them with the index enabled. This is because indexes are set at a per entity level.
To ensure you index all these fields, you'll need to read every entity then write it back down. For smaller datasets, you do go this with a simple query and loop. For larger datasets you will want to explore something like Cloud Dataflow.
If you have a large dataset and concerns on costs, you could do some optimizations. For example, do a keys-only query against the indexed fields, then if any read entity matches that list, don't write it back (since it's already indexed).
We are working on SLI Search module for our client. I want to ask you is that a good approach to make separate tables or should I manage all clients data into single table?
Keep in mind that user can ask to update their module, means that table structure can be different of each client.
Secondly, client will give me their data and I will update all clients data in their tables or in single table using Package.
So, it will be good approach to make their separate tables in database or should i make a centralized table for all our clients?
If the table field varies from client (customer) to client, then it is recommended to have separate tables otherwise you will create a lot of null records.
Secondly, if every client is a separate instance and there is no correlation between them, then why now have a separate schema or database for them?
Sounds like SQLite database will be a good fit as every client data structure is unique, so you better make it portable so you can amend one SQLite database at a time.
Centralised Tables Approach
There is a new study and research that we used recently but you need to consider indexing issue for fast search on the fields.
Anyway, you can create a flat table, which have have many columns with sequential numbers e.g. Field1, Field2, Field3, Field4.....Field99, Field100... Field150 (as many potential customer fields you have).
You create another table and in that you map labels for every client (customer) to these fields. E.g.
Client ABC id is 10032
He has used from Field1 to Field11
Field1 has a label name FirstName
Field2 label is Surname
Field3 label is DOB
...
...
Field11 label is UserCountry
Now every time records are shown, you fetch the logged user labels and then map them to fields.
I hope this answers the question.
I'm building a Java app using a relational database and I wish to map it's primary data to a Solr index/es. However, I'm not sure how to map the components of a database. At the momement I've mapped a single row cell to a Solr/Lucene Document.
A doc would be something like this (each line is a field):
schema: "schemaName"
table: "tableName"
column: "columnName"
row: "rowNumber"
data: "data on schemaName.tableName.columnName.row"
This allows me to have a "fixed" Solr schema.xml(as far as I know it has to be defined "before" creating indexes). Also dynamic fields doesn't seem to serve my purpose.
What I've found while searching is that a single row is usually mapped to a Solr Document and each column is mapped as a Field. But, how can I add the column names as fields into schema.xml (when I don't know the columns a table has)? Also, I would need the info to be queried as if it was SQL. I.e, search for all rows of a column in a table, etc, etc.
With my current "solution" I can do that kind of queries but I'm worried with performance as I'm new to Solr and I don't know the implications it may have.
So, what do you say about my "solution"? Is there another way map a database to a Solr index concerning the schema.xml fields should be set before indexing? I've also "heard" that a table is usually mapped to a index: how could I achieve that?
Maybe I'm just being noob but by the research I did I don't see how I can map a database Row to a Solr Document without messing with schema.xml Fields.
I would appreciate any thoughts :) Regards.
You can specify your table columns in the schema before hand or use dynamic fields and then use the solr DIH to import the data into solr from the database. Select your dynamic fields name in the queries for DIH.
Please go through Solr DIH for database integration
I am a newbie of javascript and fusion tables and I am setting up a project which collects and elaborates data into a google spreadsheet and then submit them to a fusion table database. The data archived have to be retrieved back into the spreadsheet with SELECT query to be used for further elaboration or to be updated and submitted again into the fusion table.
the structure of the record to be inserted and retrieved includes the following field:
field1, field2, field3, etc.
where field1 is a date/time fiedl and field2 represents a list of Countries
The query I would like to create should retrieve the subset of records including the most recent date record (field1) for each country (field2). I tried to build a SELECT query with a WHERE clause but it doesn't work. I read somewhere that this kind of problems could be solved by using a self join but I am not sure it is possible to make this kind of join in fusion tables and I don't know how to proceed.
Can anybody give me some hints or suggestions to find a solution?
Thanks in advance
Luigi