Stream Analytics -> Master - Details - master-detail

I'm trying to create an Azure Stream Analytics select that will process json in this format:
{
"deviceid": "02060014440133F0",
"receivedat": "2017-02-24T10:16:50.9081833",
"messageid": "286eded6-dff1-4f6b-85be-ce4c3c050b69",
"telemetryvalues": [
{
"Name": "JUMPER1_2",
"value": "0",
"id": "9be40e7b-7589-4d88-af69-9a00bf71e203",
"telemetryid": "a0259ae9-de01-47fb-9c0c-01fc72c85621",
"scaledvalue": "0"
},
{
"Name": "JUMPER1_2",
"value": "2",
"id": "837c4645-d13a-402f-9cf1-ac36b6bedef8",
"telemetryid": "a0259ae9-de01-47fb-9c0c-01fc72c85621",
"scaledvalue": "0,66"
},
....
}
and insert it into two tables (Master - Detail)
I've created the two tables:
CREATE TABLE [dbo].[Telemetry](
[Id] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Telemetry_Id] DEFAULT (NEWSEQUENTIALID()) NOT NULL,
[DeviceId] [varchar](20) NULL,
[MessageId] [varchar](40) NULL,
[ReceivedAt] [datetimeoffset](7) NOT NULL,
[CreatedAt] [datetimeoffset](7) DEFAULT (sysutcdatetime()) NOT NULL
and
CREATE TABLE [dbo].[TelemetryValues](
[Id] UNIQUEIDENTIFIER CONSTRAINT [DF_TelemetryValue_Id] DEFAULT (NEWSEQUENTIALID()) NOT NULL,
[TelemetryId] VARCHAR(40),
[Name] VARCHAR(28),
[Value] VARCHAR(255) NOT NULL,
[ScaledValue] VARCHAR(255) NOT NULL,
[CreatedAt] [datetimeoffset](7) DEFAULT (sysutcdatetime()) NOT NULL
My SA is very simple:
SELECT
*
INTO
[TelemetryData]
FROM [DeviceData]
Where 'TelemetryData' points to my 'Telemetry' SQL table and 'DeviceData' to an eventhub with data.
However I'm not getting any data into my tables..... so I'm not really sure if SA can insert into two tables, or I'm doing something wrong...
N.B. If I try to store the data in a blob storage instead, then data are coming through, so it's not because of missing data.

You can create several tables as several outputs of your ASA job. However I see your query only writes to one output (TelemetryData).
Also from what I see, no data is written in the SQL table because you have a mismatch between the result of your query schema and the schema of your table.
E.g. the output of SELECT * will be deviceid, receivedat, messageid, telemetrydata.
However the table you created have a different schema with different types.
When you use blobs, it used to work because blobs are not expecting a fixed schema. However with SQL, schema and types should match exactly.
Thanks,
JS - Azure Stream Analytics

Related

Create a table in Jupyter Notebook that connects to a database in Beekeeper and AWS

I am trying to generate some tables in a database that I have already created in AWS and connected through Beekeeper. I am using Jupyter notebook with Python 3 to accomplish this.
However, when I run the cell for my first table (which is after my sell where I connect to the database using psycopg2) I get a Syntax Error. Here is the code I have been trying:
CREATE TABLE table(
tabID SERIAL PRIMARY KEY NOT NULL,
tabName VARCHAR(30) NOT NULL,
Breed VARCHAR(5) NOT NULL,
Gender VARCHAR(2) NOT NULL,
Weight SMALLINT NOT NULL,
Age NUMERIC(2,1) NOT NULL
);
conn.commit()
The error is showing up with the ^ (carat) under the T in first table. I've tried moving around commas and parenthesis.
File "C:\Users\AppData\Local\Temp\ipykernel_10756\4115600721.py", line 1
CREATE TABLE table(
^
SyntaxError: invalid syntax
I was able to figure it out, and I'm posting the answer for anyone who might run into something similar.
table = ("""CREATE TABLE table (
tabID SERIAL PRIMARY KEY NOT NULL,
tabName VARCHAR(30) NOT NULL,
Breed VARCHAR(5) NOT NULL,
Gender VARCHAR(2) NOT NULL,
Weight SMALLINT NOT NULL,
Age NUMERIC(2,1) NOT NULL);
""")
cursor.execute(table)
conn.commit()

How to make kafka replicate the source table structure in the destination table

I need some advise on how to make kafka replicate the source table structure in the destination table. Let me explain…
Source db: SQL Server
Source table :
CREATE TABLE dbo.PEOPLE(
ID NUMERIC(10) NOT NULL PRIMARY KEY,
FIRST_NAME VARCHAR(10),
LAST_NAME varchar(10),
AGE NUMERIC(3)
)
Target db: PostgreSQL
Kafka sink connector:
name=pg-sink-connector_people
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
tasks.max=4
topics=myserver.dbo.PEOPLE
connection.url=jdbc:postgresql://localhost:5432/kafkadb
connection.user=postgres
connection.password=mypassword
insert.mode=upsert
pk.mode=record_key
pk.fields=ID
table.name.format=PEOPLE
auto.create=true
offset.storage.file.filename=C:/kafka_2.13-2.7.0/tmp/connect.offsets
bootstrap.servers=localhost:9092
plugin.path=C:/kafka_2.13-2.7.0/plugins
transforms=flatten
transforms.flatten.type=org.apache.kafka.connect.transforms.Flatten$Value
transforms.flatten.delimiter=_
auto.evolve=true
When I run the connector above, Kafka creates on PostgreSQL a target table like this:
CREATE TABLE public."PEOPLE" (
"before_ID" int8 NULL,
“before_FIRST_NAME" text NULL,
“before_LAST_NAME" text NULL,
"before_AGE" int8 NULL,
"after_ID" int8 NULL,
"after_FIRST_NAME" text NULL,
"after_LAST_NAME" text NULL,
"after_AGE" int8 NULL,
source_version text NOT NULL,
source_connector text NOT NULL,
source_name text NOT NULL,
source_ts_ms int8 NOT NULL,
source_snapshot text NULL DEFAULT 'false'::text,
source_db text NOT NULL,
source_schema text NOT NULL,
source_table text NOT NULL,
source_change_lsn text NULL,
source_commit_lsn text NULL,
source_event_serial_no int8 NULL,
op text NOT NULL,
ts_ms int8 NULL,
transaction_id text NULL,
transaction_total_order int8 NULL,
transaction_data_collection_order int8 NULL,
"ID" text NOT NULL,
CONSTRAINT "PEOPLE_pkey" PRIMARY KEY ("ID")
);
The point is I don’t need these before/after fields or the others that were created. What’s the best way for replicating exactly the same structure of my source table?
Thanks!
When you use Debezium it includes metadata about the change record that it's captured, as well as the before and after state. All this data comes through as a nested object.
At the moment you're just flattening all these fields with the Flatten Single Message Transform. If you don't want the additional fields you can use the ExtractNewRecordState SMT that's provided by Debezium for exactly this purpose. Use it instead of Flatten:
transforms=unwrap
transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState

How to delete documents from Filetable?

I am trying to delete some documents from sql server's filetable.
Here I have one table in which I am storing all my Attachment's details and Documents in sql server's file table named Attchemnts.
AttachmentDetails table has below schema,
CREATE TABLE [dbo].[AttachmentDetails](
[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[DocumentName] [nvarchar](max) NULL,
[DocumentType] [nvarchar](max) NULL,
[ModifiedDateTime] [datetime] NOT NULL,
[CreatedDateTime] [datetime] NOT NULL,
[CreatedBy] [nvarchar](254) NULL,
[ModifiedBy] [nvarchar](254) NULL,
[IsDeleted] [bit] NULL,
)
Whenever I am uploading any document to File table then I am inserting that document's detailed information in AttchemntsDetails table as per table schema.
Here I have tried the below solution
CREATE PROCEDURE [dbo].[DeleteFiles]
AS
BEGIN
DELETE Attachments
FROM AttachmentDetails a
WHERE
DocumentType = 'video/mp4' AND DATEDIFF(day, a.CreatedDateTime, GETDATE())<11
end
This procedure suppose to delete only Video/mp4 files who are 10 days older But it deletes any type of document from the filetable.
SQL is a set-based language. For every cursor/loop based script there's a far simpler and faster set based solution. In any case, the way this query is written would result in random deletions since there's no guarantee what all those TOP 1 queries will return without an ORDER BY clause.
It looks like you're trying to delete all video attachments older than 30 days. It also looks like the date is stored in a separate table called table1. You can write a DELETE statement whose rows come from a JOIN if you use the FROM clause, eg:
DELETE Attachments
FROM Attachments inner join table1 a on a.ID=Attachments.ID
WHERE
DocumentType = 'video/mp4' AND
CreatedDateTime < DATEADD(day,-30,getdate())
EDIT
The original query contained DATEADD(day,30,getdate()) when it should be DATEADD(day,-30,getdate())
Example
Assuming we have those two tables :
create table attachments (ID int primary key,DocumentType nvarchar(100))
insert into attachments (ID,DocumentType)
values
(1,'video/mp4'),
(2,'audio/mp3'),
(3,'application/octet-stream'),
(4,'video/mp4')
and
create table table1 (ID int primary key, CreatedDateTime datetime)
insert into table1 (ID,CreatedDateTime)
values
(1,dateadd(day,-40,getdate())),
(2,dateadd(day,-40,getdate())),
(3,getdate()),
(4,getdate())
Executing the DELETE query will only delete the Attachment with ID=1. The query
select *
from Attachments
```
Will return :
```
ID DocumentType
2 audio/mp3
3 application/octet-stream
4 video/mp4
```

Inserting into a joined view SQL Server

This is a question more about design than about solving a problem.
I created three tables as such
CREATE TABLE [CapInvUser](
[UserId] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](150) NOT NULL,
[AreaId] [int] NULL,
[Account] [varchar](150) NULL,
[mail] [varchar](150) NULL,
[UserLevelId] [int] NOT NULL
)
CREATE TABLE [CapInvUserLevel](
[UserLevelId] [int] IDENTITY(1,1) NOT NULL,
[Level] [varchar](50) NOT NULL
)
CREATE TABLE [CapInvUserRegistry](
[UserRegistryId] [int] IDENTITY(1,1) NOT NULL,
[UserLevelId] int NOT NULL,
[DateRegistry] DATE NOT NULL,
[RegistryStatus] VARCHAR(50) NOT NULL,
)
With a view that shows all the data on the first table with "AreaId" being parsed as the varchar identifier of that table, the UserLevel being parsed as the varchar value of that table, and a join of the registry status of the last one.
Right now when I want to register a new user, I insert into all three tables using separate queries, but I feel like I should have a way to insert into all of them at the same time.
I thought about using a stored procedure to insert, but I still don't know if that would be apropiate.
My question is
"Is there a more apropiate way of doing this?"
"Is there a way to create a view that will let me insert over it? (without passing the int value manually)"
--This are just representations of the tables, not the real ones.
-- I'm still learning how to work with SQL Server properly.
Thank you for your answers and/or guidance.
The most common way of doing this, in my experience, is to write a stored procedure that does all three inserts in the necessary order to create the FK relationships.
This would be my unequivocal recommendation.

Avoiding name collisions when creating external tables on SQL Server datawarehouse

I am trying to migrate a database AdventureWorks using Polybase to SQL Server datawarehouse.
Suppose I have a schema HumanResources and a table Department in that schema.
CREATE TABLE [HumanResources].[Department]
(
[DepartmentID] [smallint] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
I need to create an external table for the data of [HumanResources].[Department] before loading the data from Azure blob into SQL Server datawarehouse.
CREATE EXTERNAL TABLE ex.TableName
(
[DepartmentID] [smallint] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[ModifiedDate] [datetime] NOT NULL
)
WITH (
LOCATION='/path/',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile
);
I am creating all external tables under an [ex] schema, how should I represent the original schema to avoid collisions.
I cannot do [ex].[HumanResources].[Department] and I would like to avoid creating unnecessary schemas for external tables.
Is there an easy way of representing this?
A common pattern we see is to simply add _ext to the end of the table name. So following your example you'd have the following:
[HumanResources].[Department]
[HumanResources].[Department_ext]

Resources