SQL Server : JSON formatting grouping possible? - arrays

I have a single SQL Server table of the form:
192.168.1.1, 80 , tcp
192.168.1.1, 443, tcp
...
I am exporting this to json like this:
SELECT
hostIp AS 'host.ip',
'ports' = (SELECT port AS 'port', protocol AS 'protocol'
FOR JSON PATH)
FROM
test
FOR JSON PATH
This query results in:
[
{
"host": {
"ip": "192.168.1.1"
},
"ports": [
{
"port": 80,
"protocol": "tcp"
}
]
},
{
"host": {
"ip": "192.168.1.1"
},
"ports": [
{
"port": 443,
"protocol": "tcp"
}
]
},
....
However I wanted all data from a single IP grouped as such:
[
{
"host": {
"ip": "192.168.1.1"
},
"ports": [
{
"port": 80,
"protocol": "tcp"
},
{
"port": 443,
"protocol": "tcp"
}
]
},
...
I have found that there seems to be aggregate functions, but they either don't work for me or are for Postgresql, my example is in SQL Server.
Any idea how to get this to work?

The only simple way to do this is to self-join
SELECT
tOuter.hostIp AS [host.ip]
,ports = (
SELECT
tInner.port
,tInner.protocol
FROM test tInner
FOR JSON PATH
)
FROM test tOuter
GROUP BY
tOuter.hostIp
FOR JSON PATH;
It is obviously inefficient to self-join, but SQL Server does not support JSON_AGG. You can simulate it with STRING_AGG and no self-join
SELECT
t.hostIp AS [host.ip]
,ports = JSON_QUERY('[' + STRING_AGG(tInner.json, ',') + ']')
FROM test t
CROSS APPLY (
SELECT
t.port
,t.protocol
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) tInner(json)
GROUP BY
t.hostIp
FOR JSON PATH;
db<>fiddle
As you might imagine, it gets more complex as more and more levels of nesting are involved.
Note:
Do not use ' for quoting column names, only []
There is no need to alias a column for JSON if it is already that name

Related

Using Volttron aggregation agent

I'm trying to get the aggregate agent to work with timescale db.
https://volttron.readthedocs.io/en/main/developing-volttron/developing-agents/specifications/aggregate.html
I have a file aggregation.config
{
"connection": {
"type": "postgresql",
"params": {
"dbname": "volttrondb",
"host": "127.0.0.1",
"port": 5432,
"user": "user",
"password": "password",
"timescale_dialect": true
}
},
"aggregations":[
# list of aggregation groups each with unique aggregation_period and
# list of points that needs to be collected
{
"aggregation_period": "1h",
"use_calendar_time_periods": true,
"utc_collection_start_time":"2016-03-01T01:15:01.000000",
"points": [
{
"topic_names": ["campus/building/fake/EKG_Cos", "campus/building/fake/EKG_Sin"],
"aggregation_topic_name":"campus/building/fake/avg_of_EKG_Cos_EKG_Sin",
"aggregation_type": "avg",
"min_count": 2
}
]
}
]
}
And run the following command
vctl install services/core/SQLAggregateHistorian/ --tag aggregate-historian -c config/aggregation.config --start
It starts correctly - the vctl status shows it running and there are no errors in the log.
I do not see the point campus/building/fake/avg_of_EKG_Cos_EKG_Sin in the topics table.
Any suggestions?

80 Postgres databases in Azure Data Factory to copy into

I'm using Azure Data factory,
I'm using SQLServer as source and Postgres as target. Goal is to copy 30 tables from SQLServer, with transformation, to 30 tables in Postgres. Hard part is I have 80 databases from and to, all with the exact same layout but different data. Its one database per customer so 80 customers each with their own databases.
Linked Services doesn't allow parameters for Postgres.
I have one dataset per source and target using parameters for schema and table names.
I have one pipeline per table with SQLServer source and Postgres target.
I can parameterize the SQLServer source in linked service but not Postgres
Problem is how can I copy 80 source databases to 80 target databases without adding 80 target linked services and 80 target datasets? Plus I'd have to repeat all 30 pipelines per target database.
BTW I'm only familiar with the UI, however anything else that does the job is acceptable.
Any help would be appreciated.
There is simple way to implement this. Essentially you need to have a single Linked Service, which reads the connection string out of KeyVault. You can then parameterize source and target as keyvault secret names, and easily switch between data sources by just changing the secret name. This relies on all connection related information being enclosed within a single connection string.
I will provide a simple overview for Postgresql, but the same logic applies to MSSQL servers as source.
Implement a Linked Service for Azure Key Vault.
Add a Linked Service for Azure Postgresql that uses Key Vault to store access url in format: Server=your_server_name.postgres.database.azure.com;Database=your_database_name;Port=5432;UID=your_user_name;Password=your_password;SSL Mode=Require;Keepalive=600; (advise to use server name as secret name)
Pass this parameter, which is essentially correct secret name, in the Pipeline (you can also implement a loop that would accept immediately array of x elements, and parse n elements at a time into separate pipeline)
Linked Service Definition for KeyVault:
{
"name": "your_keyvault_name",
"properties": {
"description": "KeyVault",
"annotations": [],
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://your_keyvault_name.vault.azure.net/"
}
}
}
Linked Service Definition for Postgresql:
{ "name": "generic_postgres_service".
"properties": {
"type": "AzurePostgreSql",
"parameters": {
"pg_database": {
"type": "string",
"defaultValue": "your_database_name"
}
},
"annotations": [],
"typeProperties": {
"connectionString": {
"type": "AzureKeyVaultSecret",
"store": {
"referenceName": "KeyVaultName",
"type": "LinkedServiceReference"
},
"secretName": "#linkedService().secret_name_for_server"
}
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
}
Dataset Definition for Postgresql:
{
"name": "your_postgresql_dataset",
"properties": {
"linkedServiceName": {
"referenceName": "generic_postgres_service",
"type": "LinkedServiceReference",
"parameters": {
"secret_name_for_server": {
"value": "#dataset().secret_name_for_server",
"type": "Expression"
}
}
},
"parameters": {
"secret_name_for_server": {
"type": "string"
}
},
"annotations": [],
"type": "AzurePostgreSqlTable",
"schema": [],
"typeProperties": {
"schema": {
"value": "#dataset().schema_name",
"type": "Expression"
},
"table": {
"value": "#dataset().table_name",
"type": "Expression"
}
}
}
}
Pipeline Definition for Postgresql:
{
"name": "your_postgres_pipeline",
"properties": {
"activities": [
{
"name": "Copy_Activity_1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
...
... i skipped definition
...
"inputs": [
{
"referenceName": "your_postgresql_dataset",
"type": "DatasetReference",
"parameters": {
"secret_name_for_server": "secret_name"
}
}
]
}
],
"annotations": []
}
}

How to get the table-name and database-name in the CDC event received from debezium kafka connect

Setup: I have CDC enabled on MS SQL Server and the CDC events are fed to Kafka using debezium kafka connect(source). Also more than one table CDC events are routed to a single topic in Kafka.
Question: Since I have more than one table data in the kafka topic, I would like to have the table name and the database name in the CDC data.
I am getting the table name and database name in MySQL CDC but not in MS SQL CDC.
Below is the Debezium source connector for the SQL Server
curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
"name": "cdc-user_profile-connector",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "<<hostname>>",
"database.port": "<<port>>",
"database.user": "<<username>>",
"database.password": "<<password>>",
"database.server.name": "test",
"database.dbname": "testDb",
"table.whitelist": "schema01.table1,schema01.table2",
"database.history.kafka.bootstrap.servers": "broker:9092",
"database.history.kafka.topic": "digital.user_profile.schema.audit",
"database.history.store.only.monitored.tables.ddl": true,
"include.schema.changes": false,
"event.deserialization.failure.handling.mode": "fail",
"snapshot.mode": "initial_schema_only",
"snapshot.locking.mode": "none",
"transforms":"addStaticField,topicRoute",
"transforms.addStaticField.type":"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addStaticField.static.field":"source_system",
"transforms.addStaticField.static.value":"source_system_1",
"transforms.topicRoute.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.topicRoute.regex":"(.*)",
"transforms.topicRoute.replacement":"digital.user_profile",
"errors.tolerance": "none",
"errors.log.enable": true,
"errors.log.include.messages": true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
}
}'
I am getting the below output (Demo data)
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}
My requirement is to get as below
{
"before": {
"profile_id": 147,
"email_address": "test#gmail.com"
},
"after": {
"profile_id": 147,
"email_address": "test_modified#gmail.com"
},
"source": {
"version": "0.9.4.Final",
"connector": "sqlserver",
"name": "test",
"ts_ms": 1556723528917,
"change_lsn": "0007cbe5:0000b98c:0002",
"commit_lsn": "0007cbe5:0000b98c:0003",
"snapshot": false,
"db": "testDb",
"table": "table1/table2"
},
"op": "u",
"ts_ms": 1556748731417,
"source_system": "source_system_1"
}
This is planned as a part of https://issues.jboss.org/browse/DBZ-875 issue
Debezium Kafka-Connect generally puts data from each table in a separate topic and the topic name is of the format hostname.database.table. We generally use the topic name to distinguish between the source table & database name.
If you are putting the data from all the tables manually into one topic then you might have to add the table and database name manually as well.

How can I "shred" JSON data already in a SQL database?

I have successfully imported a JSON file into a SQL 2016 database and I am now attempting to parse that data so that I can populate a table for each of the fields in the column that is holding the JSON data. I am no DBA and it took me a couple of days to figure out how to successfully import this data. I need to know how to accomplish this using SQL. I am not certain what other information I need to provide here, so if anything else is needed, let me know and I will provide that info.
The table name is dbo.IncapsulaSourceData. This table has 5 columns: Site_ID, JSON_Source, Processed, Date_inserted, Date_processed.
Here is a sample of the JSON data that is being stored in the JSON_Source column:
{
"site_id":123456,
"statusEnum":"fully_configured",
"status":"fully-configured",
"domain":"site.name.com",
"account_id":111111,
"acceleration_level":"standard",
"site_creation_date":1410815844000,
"ips":[
"99.99.99.99"
],
"dns":[
{
"dns_record_name":"site.name.com",
"set_type_to":"CNAME",
"set_data_to":[
"frgt.x.wafdns.net"
]
}
],
"original_dns":[
{
"dns_record_name":"name.com",
"set_type_to":"A",
"set_data_to":[
""
]
},
{
"dns_record_name":"site.name.com",
"set_type_to":"A",
"set_data_to":[
"99.99.99.99"
]
},
{
"dns_record_name":"site.name.com",
"set_type_to":"CNAME",
"set_data_to":[
""
]
}
],
"warnings":[
],
"active":"active",
"additionalErrors":[
],
"display_name":"site.name.com",
"security":{
"waf":{
"rules":[
{
"action":"api.threats.action.block_ip",
"action_text":"Block IP",
"id":"api.threats.sql_injection",
"name":"SQL Injection"
},
{
"action":"api.threats.action.block_request",
"action_text":"Block Request",
"id":"api.threats.cross_site_scripting",
"name":"Cross Site Scripting"
},
{
"action":"api.threats.action.block_ip",
"action_text":"Block IP",
"id":"api.threats.illegal_resource_access",
"name":"Illegal Resource Access"
},
{
"block_bad_bots":true,
"challenge_suspected_bots":true,
"exceptions":[
{
"values":[
{
"ips":[
"99.99.99.99"
],
"id":"api.rule_exception_type.client_ip",
"name":"IP"
}
],
"id":123456789
},
{
"values":[
{
"ips":[
"99.99.99.99"
],
"id":"api.rule_exception_type.client_ip",
"name":"IP"
}
],
"id":987654321
}
],
"id":"api.threats.bot_access_control",
"name":"Bot Access Control"
},
{
"activation_mode":"api.threats.ddos.activation_mode.auto",
"activation_mode_text":"Auto",
"ddos_traffic_threshold":1000,
"id":"api.threats.ddos",
"name":"DDoS"
},
{
"action":"api.threats.action.quarantine_url",
"action_text":"Auto-Quarantine",
"id":"api.threats.backdoor",
"name":"Backdoor Protect"
},
{
"action":"api.threats.action.block_ip",
"action_text":"Block IP",
"id":"api.threats.remote_file_inclusion",
"name":"Remote File Inclusion"
},
{
"action":"api.threats.action.disabled",
"action_text":"Ignore",
"id":"api.threats.customRule",
"name":"wafRules"
}
]
},
"acls":{
"rules":[
{
"ips":[
"99.99.99.99"
],
"id":"api.acl.whitelisted_ips",
"name":"Visitors from whitelisted IPs"
},
{
"geo":{
"countries":[
"BR",
"NL",
"PL",
"RO",
"RU",
"TR",
"TW",
"UA"
]
},
"id":"api.acl.blacklisted_countries",
"name":"Visitors from blacklisted Countries"
}
]
}
},
"sealLocation":{
"id":"api.seal_location.none",
"name":"No seal "
},
"ssl":{
"origin_server":{
"detected":true,
"detectionStatus":"ok"
},
"generated_certificate":{
"ca":"GS",
"validation_method":"email",
"validation_data":"administrator#site.name.com",
"san":[
"*.site.name.com"
],
"validation_status":"done"
}
},
"siteDualFactorSettings":{
"specificUsers":[
],
"enabled":false,
"customAreas":[
],
"allowAllUsers":true,
"shouldSuggestApplicatons":true,
"allowedMedia":[
"ga",
"sms"
],
"shouldSendLoginNotifications":true,
"version":0
},
"login_protect":{
"enabled":false,
"specific_users_list":[
],
"send_lp_notifications":true,
"allow_all_users":true,
"authentication_methods":[
"ga",
"sms"
],
"urls":[
],
"url_patterns":[
]
},
"performance_configuration":{
"advanced_caching_rules":{
"never_cache_resources":[
],
"always_cache_resources":[
]
},
"acceleration_level":"standard",
"async_validation":true,
"minify_javascript":true,
"minify_css":true,
"minify_static_html":true,
"compress_jepg":true,
"progressive_image_rendering":false,
"aggressive_compression":false,
"compress_png":true,
"on_the_fly_compression":true,
"tcp_pre_pooling":true,
"comply_no_cache":false,
"comply_vary":false,
"use_shortest_caching":false,
"perfer_last_modified":false,
"accelerate_https":false,
"disable_client_side_caching":false,
"cache300x":false,
"cache_headers":[
]
},
"extended_ddos":1000,
"res":0,
"res_message":"OK",
"debug_info":{
"id-info":"1234"
}
}
Here is an idea of how to parse top level and second level data from json:
select top 100
ids.Site_ID,
ids.JSON_Source,
ids.Processed,
ids.Date_inserted,
ids.Date_processed,
x.site_id,
x.statusEnum,
x.[status],
x.[domain],
x.account_id,
x.acceleration_level,
x.site_creation_date,
x.ips as ipsJson,
x.dns as dnsJson,
ipsArr.key as ipKey,
ipsArr.value as [ip],
dnsArr.dns_record_name,
dnsArr.set_type_to,
dnsArr.set_data_to
from IncapsulaSourceData isd
outer apply openjson(isd.JSON_Source)
with (
site_id bigint,
statusEnum nvarchar(max),
[status] nvarchar(max),
[domain] nvarchar(max),
account_id bigint,
acceleration_level nvarchar(max),
site_creation_date bigint,
ips nvarchar(max) as json,
dns nvarchar(max) as json
-- JSON_Source contains only 1 object, so original rows
-- won't be duplicated. But this object contains some arrays
-- e.g. ips and dns. We don't parse them in this apply so no problem here
) as x
outer apply openjson(isd.JSON_Source, '$.ips') as ipsArr
-- here we parse arrays (ips above and dns below).
-- For some ids.row if there're 10 ipsArr values then in output
-- there'll be 10 rows <ids.row, ipsArr.value>.
-- I mean maybe you want to parse them in another query at all.
outer apply openjson(isd.JSON_Source, '$.dns')
with (
dns_record_name nvarchar(max),
set_type_to nvarchar(max),
set_data_to nvarchar(max) as json
) as dnsArr
-- other outer applies here
Sorry, wrote without checking in IDE, maybe made some mistakes.
p.s. maaaaaan, your inconsistent naming makes me cry!

How to configure Opserver for SQL Server clusters

I am trying out Opserver to monitor SQL Server instances. No issue with configuring standalone instances, but when I tried to configure SQL Server clusters using the method documented here: http://www.patrickhyatt.com/2013/10/25/setting-up-stackexchanges-opserver.html
I am confused about where to put SQL Server cluster named instance and Windows node servers:
In the JSON code below:
{
"defaultConnectionString": "Data Source=$ServerName$;Initial Catalog=master;Integrated Security=SSPI;",
"clusters": [
{
"name": "SDCluster01",
"nodes": [
{ "name": "SDCluster01\\SDCluster01_01" },
{ "name": "SDCluster02\\SDCluster01_02" },
]
},
],
I assume SDCLuster01 is the instance DNS name and SDCluster01_01 and SDCluster01_02 are Windows node server names.
But what if I have a named instance (clustered) like SDCluster01\instance1?
I tried to configure it like this:
{
"defaultConnectionString": "Data Source=$ServerName$;Initial Catalog=master;Integrated Security=SSPI;",
"clusters": [
{
"name": "SDCluster01\instance1",
"nodes": [
{ "name": "SDCluster01\\SDCluster01_01" },
{ "name": "SDCluster02\\SDCluster01_02" },
]
},
],
But after deploying to Opserver it gave me this error message:
[NullReferenceException: Object reference not set to an instance of an object.]
Any ideas on how to configure the JSON file correctly for SQL Server clusters?

Resources