Azure Search Index Indexer Issues - azure-cognitive-search

We are seeing some issues with our indexer throwing an error every other time it tries to run the indexer on our data-source. Our indexer runs on a schedule and we have Sql server "Change Tracking" turned ON for db and the table that we use for indexing. The indexer runs successfully every other time which is strange. I will attach a picture to show you the indexer status page. The odd thing is it shows an error saying:
"Indexing was stopped because the data source has no change detection policy and corresponding quota of 100000 documents has been reached. To index more documents please add a change detection policy".
So like I mentioned we have the "Change detection policy" set (Change Tracking is ON) and indexer running on a schedule. And it works every other time. We are on a "standard" billing tier so we do not have a quota "100000". We believe that this error might also be contributing to our counts in the index data-source and index table not to match. The indexer is NOT deleting the ID that have been removed from our source table.
I have attached an image to show the status page of our indexer. Please help! We have launched our search and are seeing issues pretty late in the game.
Thanks in advance and let me know if you need any more info.

It's not enough to enable change tracking on the SQL side. You also need to set up an Integrated Change Tracking policy on your Azure Search datasource. This is described in the documentation

Related

Is my GAE Search corrupt?

I've got a single index in a GAE Search application.
When I call index.put I get the OverQuotaError: The API call search.IndexDocument() required more quota than is available.
When I got to the GAE Console and look under Search then my index articleIndex contains no documents but its Amount Used is 78.2KB. I've also tried retrieving documents but none are returned.
I've tried using a new index but I can the same error message in my application's logs.
I have a copy of my application and that continues to work fine - this uses the same code and data but in a separate application space.
I created a new app with the code from my "corrupted" installation and the new installation indexes fine.
Has anyone had a GAE Search index that, although empty, is listed as taking up space?
I've tried running my GAE routines right after my daily quota is renewed ().
The quota representing storage usage is reconciled nightly, so if you have recently removed documents from the index that fact will not immediately be reflected. However, you say that using a new index still produces the same problem?
Note that daily quota renewal (not to be confused with the reconciliation mentioned above) does not affect the storage limit.
If you are still having troubles, you can file a report on the external issue tracker, mention the app ID and index name, and we can help investigate the current status of the index.

Broken indexer on Azure-Search (error: multiple columns with the same name)

We are experiencing a sudden and strange issue with our Azure Search indexer. We had an index (2015-02-28-preview version) with corresponding datasource and indexer based on a table of a SQL Azure v12 database. Change tracking was enabled and changes were properly forwarded in the index. A couple of days ago, our attention was drawn by the fact that last changes in the database were no more properly replicated to the index. Being in a development phase, this index was frequently rebuilt by developers and nobody has noticed when exactly things started to go wrong.
In the Azure portal, the index is displayed in red color with an error message stating we have a duplicate column in the datasource...("Datasource contains multiple columns with the same name 'ProductId'") which is false. We cleaned the database and tried several things but could not find any duplicate column. As for today, the situation is the following :
1/ After deleting and recreating everything (index, indexer and datasource) the index is filled with the 2000 documents present in the SQL table
2/ The index is full and can be queried without any issue, though it still shows up in red with the "duplicate column" error message
3/ Due to this error, we cannot manually force a new indexation from the azure portal
4/ In order to reflect changes of the indexed table, we have to run again the script which deletes index, indexer and datasource and re-creates everything. After running this script .. we're back at step 1 above (index queryable, but in error state and cannot be updated without drop/recreate).
This problem seems to have occurred all of a sudden without any change on our side, as if there had been a server-side version change. Are there any newer release of the Azure Search Rest APIs available ? Has anyone ever encountered the same issue or has any hints on things we could check ?
Thanks for your help shedding some light on what may be broken here,
Problem fixed thanks to Eugene investigations. He discovered a bug in the C# code used to generate the datasource : a casing difference between a “ProductId” column in the database and a “ProductID” field in the index.
We fixed the misspelling and the issue is gone. Microsoft support said that they'll "fix the issue in the coming weeks" : The same code used to work properly (and is still working properly on the first run), so it looks like the indexing process has somehow become more case sensitive than before.

App Engine backup never finishes only clue is failure in map reduce worker_callback

Over the last few weeks we have repeatedly failed on doing a complete backup of the data store using the datastore admin tool. We thought the issues had to do with quota errors we were running into so we switched our application from a free to a paid app and we still have problems.
Each time we are attempting to back up to the blobstore and what occurs is that the process never finishes. We see the backup in our Pending Backups list but it never actually completes. We only have a total of 43MB of data right now so we don't see it as a data transfer problem. Looking at our default Task Queues it shows that we have two pending tasks one is a call to /_ah/mapreduce/controller_callback and another is a call to /_ah/mapreduce/worker_callback
The worker_callback racks up its retry count and the only error clue we have is on the Previous Run tab it shows the last http response code to be 500. There is no error message, nothing shows up in our error logs, it just keeps trying over and over again.
We've been able to narrow the backup problems to a specific entity kind for a particular namespace but we can't figure out why that entity kind is failing whereas the others are not. The major difference is the entity kind has a large number of embedded entities, but if the app engine is able to read / put those entities we can't understand why it seems to be having problems backing it up. The particular namespace that the error occurs in has the largest data stored for that entity kind compared to the other namespaces we have setup.
We think if we can see what error is occurring in the worker_callback we may be able to figure out why the backup is failing, or what is wrong with our data that's preventing the backup. Is there something we need to setup / enable through settings / configuration files to give us more detailed information on the backup? Or is there some other avenue we should explore to figure out how to investigate/fix this problem?
I should mention we are using the Java SDK as well as Objectify V3 to work with the data store. We are also backing up data to the Blobstore.
Thank you.
Well with the app engine team's help we figured what the problem was and we worked around the issue. I want to give details in case anyone else runs into this problem.
From issue 8363 the app engine team indicated that from their logs they could see that the map reduce failed because of the large number of properties that our entity kind had. The specific entity kind that was causing the failure had a large number of variable properties that was generating errors when map reduce tried to write out a schema. They indicated that the solution on their end was to ignore entities that were like this in the backup to make it so the backup worked successfully.
What we did to work around the issue and make the backup work was change how we told objectify to store out data. The large number of properties were being created due to our use of the #embedded keyword on a HashMap() class member field. Since the embedded keyword breaks down classes into individual components it was generating a large number of properties. We switched the member field to be #serialized and then ran a conversion process to make it use the new serialized property. This made the backup / restore work again.
You can read more about the differences between embedded and serialized on objectify's website
snielson, would you mind opening an issue on our Public issue tracker here. Remember to add your Application ID so we can further debug this specific scenario.
Thanks!

SOLR 3.6.0, After a full re-index of a bunch of entities, some of my items are not making it into the SOLR index, but no logs are being generated

Use a StreamingUpdateSolrServer, I used the following algorithm to re-index my huge dataset into SOLR.
Initialize StreamingUpdateSolrServer server = new StreamingUpdateSolrServer(solrServerUrl, numDocsToAddInBatch, numOfThreads);
For each Item…
-->Create document
-->Server.add(document)
When all finished,
server.commit();
server.optimize();
The problem:
Some of my items are not making it into the SOLR index, but no logs are being generated to tell me what happened.
I was able to find most of the documents, but some were missing. No errors in any logs – and I have substantial try/catch blocks with logs around all SOLRJ exceptions on the clients site.
Verify logging is not being hidden for the SOLR WAR
You will definitely want to verify that the SOLR server log settings are not hiding the fact that documents are failing to be added to the index.
Because SOLR uses the SLF4J API, your SOLR server could be over-riding the log settings allowing you to see an error message when the document failed to be indexed.
If you have a custom {solr-war}/WEB-INF/classes/logging.properties, you will need to make sure that the settings are not such that it is hiding the error messages.
By default, errors in adding an item should be shown automatically. So if you did not change your SOLR log settings at any point... you should be seeing any errors during indexing in your server log file.
Troubleshoot why Documents are failing to be indexed
In order to investigate this, it is helpful to follow verification step any time after the indexing is complete:
Initialize new log log_fromsolr
Initialize new log log_notfound
For each Item…
-->Search SOLR for the item. If SOLR has the object, log each item’s fields into log_fromsolr on a single line into log_fromsolr. This should include the unqiueKey for your document if you have one.
-->If document cannot be found in SOLR for this item, write a line to log_notfound with all the fields from the object from the database, also supplying the uniqueKey as the first line.
Once the verification step has completed, the log log_notfound created a list of all Documents that failed to be added into the Index.
You can use the log created by log_fromsolr to compare the document fields for an item that made it into the index and one that did not.
Verify it is not an intermittent issue
Sometimes it might be the case that it is not the same Items failing to be added to the index each time you try to index.
If you find objects in the log_notfound log, you will want to back up the current notfound log and run the indexing process again from scratch. Use a diff tool to see the differences between the first notfound log and the second notfound log.
An intermittent problem is evident when you see large numbers of differences in these files (Note: some differences are to be expected if new objects are being created in the database in between the first and second re-indexing).
If your problem is intermittent, it most certainly points at the application code with respect to your SOLR transactions not being committed correctly.
The same documents consistently come up missing each time it indexes
At this point we have to compare documents that are being found from the SOLR index, versus documents that are not getting into the Lucene index. Usually a field-by-field comparison of the object will start turning of some suspicious values that may be causing issues when adding the document to the index.
Try eliminating all the suspicious fields and then re-indexing the entire thing again. See if the documents are still failing to be indexed. If this worked, you will want to start re-introducing the fields that you removed and see if you can pinpoint the one that is the issue.

Moss 2007 SSP Error "Search application '{0}' is not ready."

I'm trying to fix a broken SSP on a MOSS 2007 site. The problem I am running into manifests itself as follows...
In the SSP "Search Settings" page I get this message:
The search service is currently offline. Visit the Services on Server page in SharePoint Central Administration to verify whether the service is enabled. This might also be because an indexer move is in progress.
In the SSP "User Profiles and Properties" page I get this in red at the top:
An error has occurred while accessing the SQL Server database or the Office SharePoint Server Search service. If this is the first time you have seen this message, try again later. If this problem persists, contact your administrator.
I have contacted my administrator, but that is currently me and it turns out I don't know any more than I do about the problem.
In the Event Log I get the following message:
The Execute method of job definition Microsoft.Office.Server.Search.Administration.IndexingScheduleJobDefinition (ID 8714973c-0514-4e1a-be01-e1fe8bc01a18) threw an exception. More information is included below.
Search application '{0}' is not ready.
The Event ID is 6398, which isn't as useful as I had hoped, but I don find the message interesting in that it looks like a String.format call where the substituted value is missing. Unfortunately no interesting in that it tells me how to fix the problem.
Sharepoint's own log offers this:
UserProfileConfigManager.GetImportStatus() failed to obtain crawl status: System.InvalidOperationException: Search application '{0}' is not ready.
at Microsoft.Office.Server.Search.Administration.SearchApi..ctor(WellKnownSearchCatalogs catalog, SearchSharedApplication application)
at Microsoft.Office.Server.Search.Administration.SearchSharedApplication.get_SearchApi()
at Microsoft.Office.Server.UserProfiles.UserProfileConfigManager.c__DisplayClass3.b__0()
at Microsoft.Office.Server.Diagnostics.FirstChanceHandler.ExceptionFilter(Boolean fRethrowException, TryBlock tryBlock, FilterBlock filter, CatchBlock catchBlock, FinallyBlock finallyBlock)
I have tried stopping and starting the search service, removing and re-adding it from the administration panel, and pretty much every other thing I could find to do with Sharepoint's own administrative tools, which leads me to believe the problem here may be database or permissions related.
There was a second SSP set up on the same server, which I think may have been part of the original cause of the problem, but removing it has made no difference.
Maybe you can make sense of this - I'm new to sharepoint, so it makes little sense to me:
"Service Shared, after looking for the solution much encontre this forum where a person tapeworm the same problem. After reading a infinity of commentaries, which I made to solve the problem was to create a new shared service, later it assigns the other applications to him and later I put it like predetermined, it initiates the import of profiles, and later the hearings, clearly first I did it in a site of tests just in case something happened, later eliminates the First Shared Service and finally the error I am solved. The snapshot of the Registry of the configuration of the application in the data base has been stored correctly. Context: application `SharedServices2 ′"
You didn't mention anything about tapeworms, so maybe you're running a newer version.
Translation of:
http://tecnologiainformaticait.wordpress.com/2008/11/21/error-sharepoint-search-application-0-is-not-ready/
Personally, I'd try the msdn forums.
So it seems that the problem was a corrupted Shared Service Provider ( no idea how it came about, but there you go ) and the only working solution I could find was to delete it and start again.
I suspect there may have been a more elegant fix by changing something in the database somewhere, but I don't know the Sharepoint Database model well enough to find it in the time available.
As an additional warning to this, if you do delete your SSP you may find that it doesn't delete cleanly so that you get a bunch of SQL server tasks that still try to run on an empty database, which can cause problems if you have anything else running on the same database server.
Same problem. My DBA delete correctly the search database and it still doesn't work.
I'll post the solution on my blog when I found something.
For the moment, we open a MS call.
Created a new SSP
2- In central admin, click on shared Services Administration
3- Click on "Change Associations" and move all the web apps to the new SSP
Choose a new search_DB and select the good server that will index if you are in a farm
Problems created by this operation:
We notice that we lose statistics information for our sites.
if you tried this solution, give us your feed back too
Thanks.
http://dejacquelot.blogspot.com/

Resources