Azure Stream Analytics output to Azure Cosmos DB - database

Stream Analytics job ( iot hub to CosmosDB output) "Start" command is failing with the following error.
[12:49:30 PM] Source 'cosmosiot' had 1 occurrences of kind
'OutputDataConversionError.RequiredColumnMissing' between processing
times '2019-04-17T02:49:30.2736530Z' and
'2019-04-17T02:49:30.2736530Z'.
I followed the instructions and not sure what is causing this error.
Any suggestions please? Here is the CosmosDB Query:
SELECT
[bearings temperature],
[windings temperature],
[tower sway],
[position sensor],
[blade strain gauge],
[main shaft strain gauge],
[shroud accelerometer],
[gearbox fluid levels],
[power generation],
[EventProcessedUtcTime],
[EventEnqueuedUtcTime],
[IoTHub].[CorrelationId],
[IoTHub].[ConnectionDeviceId]
INTO
cosmosiot
FROM
TurbineData

If you're specifying fields in your query (ie Select Name, ModelNumber ...) rather than just using Select * ... the field names are converted to lowercase by default when using Compatibility Level 1.0, which throws off Cosmos DB. In the portal if you open your Stream Analytics job and go to 'Compatibility level' under the 'Configure' section and select v1.1 or higher that should fix the issue. You can read more about the compatibility levels in the Stream Analytics documentation here: https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-compatibility-level

Related

<MSDialect_pyodbc, TIMESTAMP> error when comparing tables in DVT - BigQuery and SQL Server

I'm trying to compare the table data between two databases (on-prem SQL Server and BigQuery). I'm currently using Data-Validation-Tool for that (DVT).
Using the instructions from Github (link: https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/docs), I installed and created a connection for DVT.
When tried to compare two table from single source, it works fine and provide correct output. But when checked for two different source, it returns dtype: <MSDialect_pyodbc, TIMESTAMP> error.
Details:
I tried to validate over level count but still same error. Command -
data-validation validate column -sc wc_sql_conn -tc my_bq_conn -tbls EDW.dbo.TableName=project_id.dataset_name.TableName --primary-keys PK_Col_Name --count '*'
Also, When checked for schema level validation -
data-validation validate schema -sc wc_sql_conn -tc my_bq_conn -tbls EDW.dbo.TableName=project_id.dataset_name.TableName
I tried to add only INT/STRING columns in custom query and then comparing it.
Custom Query -
SELECT PK_Col_Name, Category FROM EDW.dbo.TableName
Similar custom query prepared for BQ and command for custom query comparison -
data-validation validate custom-query -sc wc_sql_conn -tc my_bq_conn -cqt 'column' -sqf sql_query.txt -tqf bq_query.txt -pk PK_Col_Name --count '*'
data-validation validate custom-query -sc wc_sql_conn -tc my_bq_conn -cqt 'row' -sqf sql_query.txt -tqf bq_query.txt -pk PK_Col_Name --count '*'
Even when using multiple approaches and the scenario doesn't involve a datetime/timestamp column, I typically only get one error -
NotImplementedError: Could not find signature for dtype: <MSDialect_pyodbc, TIMESTAMP>
I tried to google down the error, but no luck. Could someone please help me identify the error?
Additionally, there are no data-validation-tool or google-pso-data-validator tags available. If someone could add that, it may be used in the future and reach the right people.

Connection to SQL Server database using sql_exporter and Prometheus but can't execute custom metrics on db

So, I've went through many of the options to get this set up and none worked with my SQL Server setup except using sql_exporter. There is a successful connection where I can read all the built-in metrics but when I tried my own query on a specific database and its table there is always something wrong with my query such as "Invalid Object" when trying to reach the database. There have been many resources I have attempted to use but would mostly like a custom metric like: https://sysdig.com/blog/monitor-sql-server-prometheus/.
sql_exporter.yml:
# The target to monitor and the collectors to execute on it.
target:
# Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
# the schema gets dropped or replaced to match the driver expected DSN format.
data_source_name: 'sqlserver://username:password#localhost:1433'
# Collectors (referenced by name) to execute on the target.
collectors: [mssql_standard]
# Collector files specifies a list of globs. One collector definition is read from each matching file.
collector_files:
- "*.collector.yml"
prometheus.yml:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'sql_server'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9966']
When I tried the custom metric in the post I linked, sql_exporter crashes instantly no errors. My database is being found in the standard metrics of https://github.com/free/sql_exporter but I am unsure the syntax to execute a simple SELECT db_value FROM db_table. I understand there are ways out there and I've tried so will need assistance. Thank you in advance!

DataflowTemplates debezium connector issue

I am referring this document to perform mysql(installed on local machine) to pubsub data streaming using debezium connector.
My properties file looks like below
databaseName=testdb
databaseUsername=root
databaseAddress=localhost
databasePort=3306
gcpProject=GCP_project_name
databasePassword=password
whitelistedTables=instance-name.testdb.testtab
singleTopicMode=true
gcpPubsubTopicPrefix=debeziumTest
databaseManagementSystem=mysql
I have already created topic in pubsub with name "debeziumTest".
But the issue is, when i run
sudo mvn exec:java -pl cdc-embedded-connector -Dexec.args="/path/to/properties-file"
, it runs without any error:
but there is no data uploaded to pubsub.
Based on the documentation, table updates are pushed to a topic that matches this format- ${PREFIX}${DB_INSTANCE}.${DATABASE}.${TABLE}
In your case I believe you should create a topic with the name - "debeziumTestinstance-name.testdb.testtab"
This may not be the only problem based on the warnings I see in the logs you shared.
The problem seems to be with your whitelistedTables.
According to the docuumentation, you should use ${instance}.${database}.${table}, for your given example it should be whitelistedTables=testdb.databaseName.testTab (if testTab is your tablename)

Which Open source CEP shoud I choose for distributed and pipelined processing ; siddhi, Flink , Esper?

I am little bised towards siddhi cep as it has siddhi query language but it uses storm for distributed processing and WSO2 provides an web interface / dashboard to create and deploy applications . I think it will give me less independence to enhance / use some features .
Flink on the other hand seems to be good choice but it requires lot of code to implement even simple logic.
Is there a better option than these , I am
Confused
What do you mean by less independence? You can use Siddhi 4.x [1] without depending on storm by using its source and sink features to receive and send messages from one instance to another using tcp, Kafka, http, etc.
WSO2 Stream processor also uses the new version of Siddhi and with its editor you and simulate events and also debug.
Update: From 4.1 [WSO2 Stream Processor][2] can run on top of Kafka in fully distributed mode. See https://docs.wso2.com/display/SP4xx/Fully+Distributed+Deployment.
[1] https://wso2.github.io/siddhi/
[2] https://wso2.com/analytics
I would do a test...create 10 queries in each system....something like....
select * from SomeEvent where value = 1
select * from SomeEvent where value = 2
...
select * from SomeEvent where value = 9
select * from SomeEvent where value = 10
The idea is to see how easy it is to create the queries, how the API or deploy steps work and how performance changes with the number of queries.

SOQL - Convert Date To Owner Locale

We use the DBAmp for integrating Salesforce.com with SQL Server (which basically adds a linked server), and are running queries against our SF data using OPENQUERY.
I'm trying to do some reporting against opportunities and want to return the created date of the opportunity in the opportunity owners local date time (i.e. the date time the user will see in salesforce).
Our dbamp configuration forces the dates to be UTC.
I stumbled across a date function (in the Salesforce documentation) that I thought might be some help, but I get an error when I try an use it so can't prove it, below is the example useage for the convertTimezone function:
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))
Below is the error returned:
OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE" returned message "Error 13005 : Error translating SQL statement: line 1:37: expecting "from", found '('".
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE".
Can you not use SOQL functions in OPENQUERY as below?
SELECT
*
FROM
OPENQUERY(SALESFORCE,'
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))')
UPDATE:
I've just had some correspondence with Bill Emerson (I believe he is the creator of the DBAmp Integration Tool):
You should be able to use SOQL functions so I am not sure why you are
getting the parsing failure. I'll setup a test case and report back.
I'll update the post again when I hear back. Thanks
A new version of DBAmp (2.14.4) has just been released that fixes the issue with using ConvertTimezone in openquery.
Version 2.14.4
Code modified for better memory utilization
Added support for API 24.0 (SPRING 12)
Fixed issue with embedded question marks in string literals
Fixed issue with using ConvertTimezone in openquery
Fixed issue with "Invalid Numeric" when using aggregate functions in openquery
I'm fairly sure that because DBAmp uses SQL and not SOQL, SOQL functions would not be available, sorry.
You would need to expose this data some other way. Perhaps it's possible with a Salesforce report, web-service, or compiling the data through the program you are using to access the (DBAmp) SQL Server.
If you were to create a Salesforce web service, the following example might be helpful.
global class MyWebService
{
webservice static AggregateResult MyWebServiceMethod()
{
AggregateResult ar = [
SELECT
HOUR_IN_DAY(convertTimezone(CreatedDate)) Hour,
SUM(Amount) Amount
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))];
system.debug(ar);
return ar;
}
}

Resources