Azure Search fails to index WADLogsTable - azure-cognitive-search

I am creating a log crawler to combine logs from all our Azure applications. Some of them store logs in SLAB format and some of them simply use the Azure Diagnostics Tracer. Since version 2.6, Azure Diagnostics tracer is actually creating two Timestamp columns in the Azure Table "WADLogsTable". The explanation for this behavior from Microsoft is the following:
"https://azure.microsoft.com/en-us/documentation/articles/vs-azure-tools-diagnostics-for-cloud-services-and-virtual-machines/"
TIMESTAMP is PreciseTimeStamp rounded down to the upload frequency boundary. So, if your upload frequency is 5 minutes and the event time 00:17:12, TIMESTAMP will be 00:15:00.
Timestamp is the timestamp at which the entity was created in the Azure table.
Sadly Azure Search currently only supports case insensitive column mapping, so when I create a simple datasource, index and indexer, I get an exception about multiple columns existing in the datasource with the same name (Timestamp).
I tried not to use Timestamp and instead use the PreciseTimeStamp, but then I get a different exception:
"Specified cast is not valid.Couldn't store <8/18/2016 12:10:00 AM> in Timestamp Column. Expected type is DateTimeOffset."
I assume this is because the current Azure Table datasource insists on keeping track of Timestamp column for change tracking behind the scenes.
The behavior is the same if I programmatically create all the objects, or use the "Import Data" functionality on the portal.
Does anyone have any other strategy or approach to overcome this issue?
We are happily indexing our SLAB tables btw, it's just the WAD failing now.

Related

How to push data from a on-premises database to tableau crm

We have an on-premises oracle database installed on a server. We have to create some Charts/Dashboards with Tableau CRM on those data on-premises. Note that, tableau CRM is not Tableau Online, it is a Tableau version for the Salesforce ecosystem.
Tableau CRM has APIs, so we can push data to it or can upload CSV programmatically to it.
So, what can be done are,
Run a nodeJS app on the on-premise server, pull data from Oracle DB, and then push to Tableau CRM via the TCRM API.
Run a nodeJS app on the on-premise server, pull data from Oracle DB, create CSV, push the CSV via TCRM API
I have tested with the 2nd option and it is working fine.
But, you all know, it is not efficient. Because I have to run a cronJob and schedule the process multiple times in a day. I have to query the full table all the time.
I am looking for a better approach. Any other tools/technology you know to have a smooth sync process?
Thanks
The second method you described in the questions is a good solution. However, you can optimize it a bit.
I have to query the full table all the time.
This is can be avoided. If you take a look at the documentation of SObject InsightsExternalData you can see that it has a field by name Operation which takes one of these values Append, Delete, Overwrite, Upsert
what you will have to do is when you push data to Tableau CRM you can use the Append operator and push the records that don't exist in TCRM. That way you only query the delta records from your database. This reduces the size of the CSV you will have to push and since the size is less it takes less time to get uploaded into TCRM.
However, to implement this solution you need two things on the database side.
A unique identifier that uniquely identifies every record in the database
A DateTime field
Once you have these two, you have to write a query that sorts all the records in ascending order of the DateTime field and take only the files that fall below the last UniqueId you pushed into TCRM. That way your result set only contains delta records that you don't have on TCRM. After that you can use the same pipeline you built to push data.

Import Active Directory to SQL Server

I'm working on a Microsoft BI project.
I am currently in the process of connecting my systems to SQL Server. I want to connect my Active Directory to a table in SQL Server and I want to sync to one table per hour. This means that every hour the details of the Active Directory will be updated.
I realized that it is necessary to use SSIS to do this I would be happy for help to connect my AD to SQL Server with the help of SSIS.
There are two routes available to you to sync AC user classes to a table. You can use an ADO source in an SSIS Data Flow Task or you can write custom .NET code as part of a Script Source. The right answer depends on your team's ability to maintain and troubleshoot a particular solution as well as the size of your AD tree/forest. If you're a small shop (under a thousand) anything is going to work. If you're a larger shop, then you need to worry about the query mechanism and the total rows returned as there is an upper boundary of how many results can be returned in a single query. In that case, then a script task likely makes more sense as you can more easily write a query to pull all the accounts that start with A, B, etc. I've never worked with Hebrew, so I assume one could do a similar filter for aleph, bet, etc.
General steps
Identify your domain controller as you need to know what server to ask information from. I do not know how to deal with Azure Active Directory requests as I believe it works a bit different there but haven't had client work that needed it.
Create a Connection Manager for ADO.NET . Use the ".Net Providers for OleDb\OLE DB Provider for Microsoft Directory Services" and point that to your DC.
Write a query to pull back the data you need. Based on the comment, it seems you want something like this
SELECT
distinguishedName
, mail
, samaccountname
, mobile
, telephoneNumber
, objectSid
, userAccountControl
, title
, sn
FROM
'LDAP://DC=domain,DC=net'
WHERE
sAMAccountType = 805306368
ORDER BY
sAMAccountName ASC
Using that query, we'll add a Data Flow Task and within it, add an ADO.NET Source. Configure it to use our ADO.NET Connection manager and use the above query (adjusting for the LDAP line and any other fields you do/don't need)
Add an OLE DB Connection Manager to your package and point it to the database that will record the data.
Add an OLE DB Destination to the Data Flow and connect the output line from the ADO.NET Source to this destination. Pick the table in the drop down list and on the Columns tab, make sure you have all of your columns connected. You might run into issues where the data types don't match so you'll need to figure out how to handle that - either change your table definition to match the source or you need to add data conversion/derived columns components to the data flow to mangle the data into the correct shape.
You might be tempted to pull in group membership. Do not. Make that a separate task as a person might be a member of many groups (at one client, I am in 94 groups). Also, the MemberOf data type is a DistinguishedName, DN, which SSIS cannot handle. So, check your types before you add them into an AD query.
References
ldap query to get disabled user records with whenchanged within 30 days
http://billfellows.blogspot.com/2011/04/active-directory-ssis-data-source.html
http://billfellows.blogspot.com/2013/11/biml-active-directory-ssis-data-source.html
Is there a particular part of the AD that you want? In any but the smallest corporations the AD tends to be huge. Making a SQL copy of an entire forest every hour is a very strange thing that may have many adverse effects on your AD, network, security and domain-wide performance.
If you are just looking to backup your AD, I believe that there are other options available, specific to the Windows AD (maybe even built-in, I'm not an AD expert).
If you really, truly want to do this here is a link to get you started: https://social.technet.microsoft.com/Forums/ie/en-US/79bb4879-4d82-4a41-81a4-c62afc6c4b1e/copy-all-ad-objects-to-sql-database?forum=winserverDS. You can find many more articles on this just by Googling "Copy AD to Sql".
However, heed the warnings well: the AD is effectively a multi-domain-wide distributed database, attempting to copy it into a centralized database like SQL Server every hour is contra-indicated. You are really fighting against its design.
UPDATE Based on the Comments:
Basically you've got too much in one question here. Sql Server, SSIS and the Active Directory (AD) are each huge subjects in and of themselves and the first time that you attempt to use all of them together you will run into many individual issues depending on your environment, experience and specific project goals. We cannot anticipate all of them in a single answer on this site.
You need to start using the information you have from the following links to begin to implement this yourself, and then ask specific questions as you run into problems along the way.
Here are the links that you can start with,
The link I provided above from MS: https://social.technet.microsoft.com/Forums/ie/en-US/79bb4879-4d82-4a41-81a4-c62afc6c4b1e/copy-all-ad-objects-to-sql-database?forum=winserverDS
The link that you provided in the comments that explains how to setup ADSI as a linked server and how to use T-SQL on it: https://yiengly.wordpress.com/2018/04/08/query-active-directory-in-sql-server-with-linked-server/
This one explain how to use AD from within an SSIS DataFlow task (but is limited to 1000 rows): https://dataqueen.unlimitedviz.com/2012/05/importing-data-from-active-directory-using-ssis/
This related one explains how to use AD within an SSIS Script task to get around the DataFlow task limits: https://dataqueen.unlimitedviz.com/2012/09/get-around-active-directory-paging-on-ssis-import/
As you work your way through this you may run into specific problems, which you can ask about at https://dba.stackexchange.com which has more specific expertise with Sql Server and SSIS.
Based on your goals, I think that you will want to use a staging table approach. That is, use your AD/Sql query to import all of the AD users records into a new/empty temporary table that has the same column definition as your production table, then use a Merge query to find and update the changed user records and insert the new user records (this is called a Differential or Type II update).

AWS DMS (Database Migration Services) full LOB not working for SQL Server

I'm trying to migrate a SQL Server table using AWS DMS to a DynamoDb target.
The table structure is as follows:
|SourceTableID|Title |Status|Display|LongDescription|
|-------------|-----------|------|-------|---------------|
|VARCHAR(100) |VARCHAR(50)|INT |BIT |NVARCHAR(MAX) |
Every field is being migrated without errors and is present in my target DynamoDb table except for the LongDescription column. This is because it is a NVARCHAR(MAX) column.
According to the documentation:
The following limitations apply when using DynamoDB as a target:
AWS DMS doesn't support LOB data unless it is a CLOB. AWS DMS converts
CLOB data into a DynamoDB string when migrating data.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.DynamoDB.html
Source Data Types for SQL Server
|SQL Server Data Types|AWS DMS Data Types|
|----------------------------------------|
|NVARCHAR (max) |NCLOB, TEXT |
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
Depending on my task configuration the following two scenarios occur:
Limited LOB mode: Information for the LongDescription column is being migrated properly to DynamoDb, however the text is truncated
Full LOB mode: Information for the LongDescription column is NOT migrated properly to DynamoDb
How can I correctly migrate an NVARCHAR(MAX) column to DynamoDb without losing any data?
Thanks!
Progress Report
I have already tried migrating to an S3 target. However it looks like S3 doesnt support Full LOB
Limitations to Using Amazon S3 as a Target
Full LOB mode is not supported.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html
I cannot use the compress T-SQL command in order to store the LongDescription column as a binary, since my SQLServer version is 2014
I tried to run the migration task to Limited LOB mode and use the maximum byte size as the limit. My maximum byte size is 45155996 so I set 46000KB as the limit. This results in an error as follows:
Failed to put item into table 'TestMigration_4' with data record with source PK column 'SourceTableID' and value '123456'
You might want to check this AWS' best practices page for storing large items/attributes in DynamoDB:
If your application needs to store more data in an item than the DynamoDB size limit permits, you can try compressing one or more large attributes or breaking the item into multiple items (efficiently indexed by sort keys). You can also store the item as an object in Amazon Simple Storage Service (Amazon S3) and store the Amazon S3 object identifier in your DynamoDB item.
I actually like the idea of saving your LongDescription in S3 and referencing its identifier in DynamoDB. I never tried, but an idea would be to use their DMS ability to create multiple migration tasks to perform this, or even create some kind of ETL solution as last resort, making use of DMS' CDC capability. You might want to get in touch with their support team to make sure it works.
Hope it helps!

Multiple indexer for single index in Azure Search?

I have two data source one in MS SQL and one is Azure Table, both of them holds User's data but just different information.
Is there any risk or possible issues if I create two indexer for each of the data source and sync them into single index, so I can do one search instead of two.
I do not see option that allow me to create indexer only, so I wonder if Microsoft trying to prevent that or not.
When you have more than one indexer updating the same index, races between updates to the same document are possible since Azure Search does not yet support document versioning.
If the each indexer updates different document fields, then the order doesn't matter; if multiple indexers update the same field(s), then the last writer wins (and you can't control which indexer will be the last writer).
If you use a deletion detection policy to remove documents from the index, then, to avoid possible races, a document should be marked deleted in both SQL and Azure Table.
I do not see option that allow me to create indexer only, so I wonder if Microsoft trying to prevent that or not.
I don't understand the above question.

Azure Search - Creating a dedicated search table in SQL, for using with an Indexer

I'm building an Assets search engine.
The data I need to be indexed for each assets is scattered into multiples tables in the SQL database.
Also, there is many events in the application that will trigger update to the asset indexed fields (in Draft, Rejected, Produced, ...).
I'm considering creating a new denormalized table in the SQL database that would exist solely for the Azure Search Index.
It would be an exact copy of the Azure Search Index fields.
The application would be responsible to fill and update the SQL table, through various event handlers.
I could then use an Azure SQL Indexer schedule to automatically import the data into the Azure Search Index.
PROS:
We are used to deal with sql table operations, so the application code remains standard, no need to learn the Azure Search API
Both the transactional and the search model are updated in the same SQL transaction (atomic). Then the Indexer update the index in an eventual consistent manner, and handle the retry logic.
Built-in support for change detection with SQL Integrated Change Tracking Policy
CONS:
Indexed data space usage is duplicated in the SQL database
Delayed Index update (minimum 5 minutes)
Do you see any other pros and cons ?
[EDIT 2017-01-26]
There is another big PRO for our usage.
During development, we regularly add/rename/remove fields from the Azure index. In its current state, some schema modifications to an Azure Index are not possible, we have to drop and re-create the index.
With a dedicated table containing the data, we simply issue a Reset to our indexer endpoint and the new index gets automatically re-populated.

Resources