I have a simple table of events
CREATE TABLE IF NOT EXISTS events
(
id integer NOT NULL,
events_info jsonb,
createdate timestamp,
updatedate timestamp,
CONSTRAINT events_pkey PRIMARY KEY (id)
)
events_info looks like
[
{
"id": "test_id",
"type": "INTERNAL", // but might be EXTERNAL too
"title": "LOGIN_EVENT",
"message": "Success login",
"createDate": "2022-07-07T11:56:52",
}
]
My idea is to create an index for field type in the array of events_info. I tried CREATE INDEX event_info_type_idx ON events USING GIN (jsonb_values_of_key(events_info, 'type') gin_trgm_ops) But got error jsonb_values_of_key function does not exist How can I do this correctly?
Related
I am trying to insert a list of json in cassandra. I have made a column named medias but can't figure out how to insert the medias in this column. I tried using a UDT but doesn't seem to be working. But the maps have dynamic datatypes. Here is a sample:
[
{
"id": "0c2ed74f-6937-490e-9385-6b1feb16e0bd",
"name": "000009A.mp4",
"mediaType": 1,
"contentSize": 0,
"width": "480.0",
"height": "480.0",
"author": "ae89d4c5-6912-43c7-a6ee-6f740baef818",
"created_at": "2020-08-02T11:12:59Z"
}
]
Can anyone suggest me how can i achieve this? Any help would be great.
EDIT
This is how I have defined my table
CREATE TABLE post(
id text,
title text,
content text,
author text,
created_at text,
updated_at text,
medias list<frozen<media>>
PRIMARY KEY(id,author)
);
And the UDT I defined is:
CREATE TYPE media(
id text,
name text,
mediaType int,
contentSize int,
width text,
height text,
author text,
created_at text
);
If you have a known list of columns, you can insert JSON directly. Any columns not present will have a NULL value. Cassandra is very efficient at managing sparse tables.
CREATE TABLE post(
id text,
title text,
content text,
author text,
created_at text,
updated_at text,
name text,
mediaType int,
contentSize int,
width text,
height text,
PRIMARY KEY(id,author)
);
INSERT INTO post JSON '{
"id": "0c2ed74f-6937-490e-9385-6b1feb16e0bd",
"name": "000009A.mp4",
"mediaType": 1,
"contentSize": 0,
"width": "480.0",
"height": "480.0",
"author": "ae89d4c5-6912-43c7-a6ee-6f740baef818",
"created_at": "2020-08-02T11:12:59Z"
}';
See https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertJSON.html
I am using Scala Notebook on Databricks. I need to perform an INSERT of data from a dataframe to a table in SQL server. If the data already exist, no need modify or insert - only insert data that do not exist.
I tried the methods specified here https://docs.databricks.com/spark/latest/data-sources/sql-databases.html#write-data-to-jdbc, however, they don't address my usecase. The SaveMode.Append creates duplicate entries of the data, SaveMode.Overwrite replaces the existing data (table), SaveMode.Ignore does not add any new data if the table already exists.
df.write.mode(SaveMode.Overwrite).jdbc(url=dbUrl, table=table_name, dbConnectionProperties)
How can I do an INSERT of new data only to the database?
Thanks a lot in advance for your help!
Assume your current dataframe is df1.
You should read the existing data in the SQL table into another dataframe (df2).
Then use subtract (or subtractByKey): http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=subtract
val dfFinal = df1.subtract(df2)
dfFinal will contain the remaining records to insert.
NOTE: - It's a workaround. Not a full proof solution.
There can be a workaround for this issue.
You need to maintain an auto-increment key/primary key in the SQL server table. And the source data too should have this key in the data before an insert.
The following conditions can arise:
New Primary key == Old Primary key -> job will fail with constraints exception.
New Primary key != Old Primary key -> insert successfully.
The insert into table failure can be handled at the program level.
To avoid bringing in the entire set to do a comparison you could put a Unique Index on the SQL Table and use the option to Ignore Duplicates.
MS Document: Create Unique Indexes
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON <object> ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE <filter_predicate> ]
[ WITH ( <relational_index_option> [ ,...n ] ) ]
[ ON { partition_scheme_name ( column_name )
| filegroup_name
| default
}
]
[ FILESTREAM_ON { filestream_filegroup_name | partition_scheme_name | "NULL" } ]
[ ; ]
<object> ::=
{ database_name.schema_name.table_or_view_name | schema_name.table_or_view_name | table_or_view_name }
<relational_index_option> ::=
{
| IGNORE_DUP_KEY = { ON | OFF }
}
I'm trying to create an Azure Stream Analytics select that will process json in this format:
{
"deviceid": "02060014440133F0",
"receivedat": "2017-02-24T10:16:50.9081833",
"messageid": "286eded6-dff1-4f6b-85be-ce4c3c050b69",
"telemetryvalues": [
{
"Name": "JUMPER1_2",
"value": "0",
"id": "9be40e7b-7589-4d88-af69-9a00bf71e203",
"telemetryid": "a0259ae9-de01-47fb-9c0c-01fc72c85621",
"scaledvalue": "0"
},
{
"Name": "JUMPER1_2",
"value": "2",
"id": "837c4645-d13a-402f-9cf1-ac36b6bedef8",
"telemetryid": "a0259ae9-de01-47fb-9c0c-01fc72c85621",
"scaledvalue": "0,66"
},
....
}
and insert it into two tables (Master - Detail)
I've created the two tables:
CREATE TABLE [dbo].[Telemetry](
[Id] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Telemetry_Id] DEFAULT (NEWSEQUENTIALID()) NOT NULL,
[DeviceId] [varchar](20) NULL,
[MessageId] [varchar](40) NULL,
[ReceivedAt] [datetimeoffset](7) NOT NULL,
[CreatedAt] [datetimeoffset](7) DEFAULT (sysutcdatetime()) NOT NULL
and
CREATE TABLE [dbo].[TelemetryValues](
[Id] UNIQUEIDENTIFIER CONSTRAINT [DF_TelemetryValue_Id] DEFAULT (NEWSEQUENTIALID()) NOT NULL,
[TelemetryId] VARCHAR(40),
[Name] VARCHAR(28),
[Value] VARCHAR(255) NOT NULL,
[ScaledValue] VARCHAR(255) NOT NULL,
[CreatedAt] [datetimeoffset](7) DEFAULT (sysutcdatetime()) NOT NULL
My SA is very simple:
SELECT
*
INTO
[TelemetryData]
FROM [DeviceData]
Where 'TelemetryData' points to my 'Telemetry' SQL table and 'DeviceData' to an eventhub with data.
However I'm not getting any data into my tables..... so I'm not really sure if SA can insert into two tables, or I'm doing something wrong...
N.B. If I try to store the data in a blob storage instead, then data are coming through, so it's not because of missing data.
You can create several tables as several outputs of your ASA job. However I see your query only writes to one output (TelemetryData).
Also from what I see, no data is written in the SQL table because you have a mismatch between the result of your query schema and the schema of your table.
E.g. the output of SELECT * will be deviceid, receivedat, messageid, telemetrydata.
However the table you created have a different schema with different types.
When you use blobs, it used to work because blobs are not expecting a fixed schema. However with SQL, schema and types should match exactly.
Thanks,
JS - Azure Stream Analytics
I am trying to POST some test JSON data to a MS SQLServer database via REST Api hosted by Dreamfactory and I get an Error:
"message": "Required id field(s) not found in record 0: Array\n(\n [AdID] => 1\n [DateTime] => 8/22/14\n [ClickedBool] => 1\n)\n",
"code": 400
I have the database configured to autoincrement the ID, I believe:
CREATE TABLE [dbo].[AdViews] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[AdID] INT NOT NULL,
[DateTime] NVARCHAR (MAX) NOT NULL,
[ClickedBool] TINYINT NOT NULL
PRIMARY KEY CLUSTERED ([Id] ASC)
);
And when I try to post the data with an ID, it has a SQL error because it is not allowed to explicitly give an Id value.
Can you post your JSON request?
This link in the documentation is helpful
Types
id : defines a typical table identifier, translates to "int not null auto_increment primary key".
Not recommended, but if all else fails, you can allow inserts into Identity column.
SET IDENTITY_INSERT ADVIEWS ON
So I figured out that I had to have at least one record in the table before I could post, apparently it needs a record to compare so it can auto-increment the next ID number. The database table was empty when I tried the first time.
I have a table with following scehma
CREATE TABLE MyTable
(
ID INTEGER DEFAULT(1,1),
FirstIdentifier INTEGER NULL,
SecondIdentifier INTEGER NULL,
--.... some other fields .....
)
Now each of FirstIdentifier & SecondIdentifier isunique but NULLable. I want to put a unique constraint on each of this column but cannot do it because its NULLable and can have two rows with NULL values that will fail that unique constraints. Any ideas of how can I address it on schema level?
You can use a filtered index as a unique constraint.
create unique index ix_FirstIdentifier on MyTable(FirstIdentifier)
where FirstIdentifier is not null
As several have suggested, using a filtered indexe is probably the way to get what you want.
But the book answer to your direct question is that a column can be nullable if it has a unique index, but it will only be able to have one row with a null value in that field. Any more than one null would violate the index.
Your question is a little bit confusing. First, in your schema definition, you say that your columns are not allowed to hold null values, but in your description, you say that they can be null.
Anyway, assuming you've got the schema wrong, and you actually wanted the columns to allow null values, SQL Server allows you to do this by adding WHERE IS NOT NULL to the constraint.
Something along the lines of:
CREATE UNIQUE NONCLUSTERED INDEX IDX_my_index
ON MyTable (firstIdentifier)
WHERE firstIdentifier IS NOT NULL
Do a filtered unique index on the fields:
CREATE UNIQUE INDEX ix_IndexName ON MyTable (FirstIdentifier, SecondIdentifier)
WHERE FirstIdentifier IS NOT NULL
AND SecondIdentifier IS NOT NULL
It will allow NULL but still enforce uniqueness.
You can use a Filter predicate on the CREATE INDEX
From CREATE INDEX
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON ( column [ ASC | DESC ] [ ,...n ] )
[ INCLUDE ( column_name [ ,...n ] ) ]
[ WHERE <filter_predicate> ]
[ WITH ( [ ,...n ] ) ]
[ ON { partition_scheme_name ( column_name )
| filegroup_name
| default
}
]
[ FILESTREAM_ON { filestream_filegroup_name |
partition_scheme_name | "NULL" } ] [ ; ]
WHERE <filter_predicate> Creates a filtered index by specifying which
rows to include in the index. The filtered index must be a
nonclustered index on a table. Creates filtered statistics for the
data rows in the filtered index.
The filter predicate uses simple comparison logic and cannot reference
a computed column, a UDT column, a spatial data type column, or a
hierarchyID data type column. Comparisons using NULL literals are not
allowed with the comparison operators. Use the IS NULL and IS NOT NULL
operators instead.
Here are some examples of filter predicates for the
Production.BillOfMaterials table:
WHERE StartDate > '20040101' AND EndDate <= '20040630'
WHERE ComponentID IN (533, 324, 753)
WHERE StartDate IN ('20040404', '20040905') AND EndDate IS NOT NULL
Filtered indexes do not apply to XML indexes and full-text indexes.
For UNIQUE indexes, only the selected rows must have unique index
values. Filtered indexes do not allow the IGNORE_DUP_KEY option.