Querying nested fields in Flink SQL 1.11 - apache-flink

I have a schema that looks like this:
Table: org_table
`transaction_amt` VARCHAR(64) NOT NULL,
`transaction_adj_amt` BIGINT NOT NULL ,
`event_time` TIMESTAMP(3),
`fd_output` ROW<`restime` BIGINT `outcome` VARCHAR(64)>,
When I query this table like this:
SELECT transaction_amt, transaction_adj_amt, event_time, fd_output.restime as response_time, fd_output.outcome as outcome, YEAR(event_time), MONTH(event_time)
FROM org_table
When running the above query on the table I am getting an error. Is there something I am missing here?
scala.MatchError: CAST (of class org.apache.calcite.sql.fun.SqlCastFunction

Maybe you need to pay attention to the input parameters of Flink built-in functions, such as YEAR and MONTH functions, their input parameters are date types
Flink built-in function description can refer to
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/functions/systemfunctions/

Try fd_output.get(0) to access restime, fd_output.get(1) for outcome.

Related

extracting certain size data from column

I have a table in MS SQL Server 2016. the table has a column called notes varchar(255)
The data that contains in the notes column contains notes entry by end user.
Select ServiceDate, notes from my_table
ServiceDate, notes
--------------------------------------
9/1/2022 The order was called in AB13456736
9/1/2022 AB45876453 not setup
9/2/2022 Signature for AB764538334
9/2/2022 Contact for A0943847432
9/3/2022 Hold off on AB73645298
9/5/2022 ** Confirmed AB88988476
9/6/2022 /AB9847654 completed
I would like to be able to extract the word AB% from the notes column. I know the ABxxxxxxx is always 10 characters. Because the ABxxxxxx always entered in different position, it's difficult to use exact extract where to look for. I have tried substring(), left() functions and because the value AB% is always in different positions, I can't get it to extract. is there a method I can use to do this?
thanks in advance.
Assuming there is only ONE AB{string} in notes, otherwise you would need a Table-Valued Function.
Note the nullif(). This is essentially a Fail-Safe if the string does not exist.
Example
Declare #YourTable Table ([ServiceDate] varchar(50),[notes] varchar(50)) Insert Into #YourTable Values
('9/1/2022','The order was called in AB13456736')
,('9/1/2022','AB45876453 not setup')
,('9/2/2022','Signature for AB764538334')
,('9/2/2022','Contact for A0943847432')
,('9/3/2022','Hold off on AB73645298')
,('9/5/2022','** Confirmed AB88988476')
,('9/6/2022','/AB9847654 completed')
Select *
,ABValue = substring(notes,nullif(patindex('%AB[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%',notes),0),10)
from #YourTable
Results
ServiceDate notes ABValue
9/1/2022 The order was called in AB13456736 AB13456736
9/1/2022 AB45876453 not setup AB45876453
9/2/2022 Signature for AB764538334 AB76453833
9/2/2022 Contact for A0943847432 NULL
9/3/2022 Hold off on AB73645298 AB73645298
9/5/2022 ** Confirmed AB88988476 AB88988476
9/6/2022 /AB9847654 completed NULL

XPath 'contains()' requires a singleton (or empty sequence)

Given the XML:
<Dial>
<DialID>
24521
</DialID>
<DialName>
Base Price
</DialName>
</Dial>
<Dial>
<DialID>
24528
</DialID>
<DialName>
Rush Options
</DialName>
<DialValue>
1.5
</DialValue>
</Dial>
<Dial>
<DialID>
24530
</DialID>
<DialName>
Bill Rush Charges
</DialName>
<DialValue>
School
</DialValue>
</Dial>
I can use the contains() function in my xpath:
//Dial[DialName[contains(text(), 'Bill')]]/DialValue
To retrieve the values I'm after:
School
The above XML is stored in a field in my SQL database so I'm using the .value method to select from that field.
SELECT Dials.DialDetail.value('(//Dial[DialName[contains(text(), "Bill")]]/DialValue)[1]','VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
I can't seem to get the syntax right though... the xpath works as expected (tested in Oxygen and elsewhere) but when I use it in the XQuery argument of the .value() method, I get an error:
Started executing query at Line 1
Msg 2389, Level 16, State 1, Line 36
XQuery [Dials.DialDetail.value()]: 'contains()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'
Total execution time: 00:00:00.004
I've tried different variations of single and double quotes with no effect. The error refers to an XPath data type for attributes, but I'm not retrieving an attribute; I'm getting the text value. I receive the same error if I type the response with //Dial[DialName[contains(text(), 'Bill')]]/DialValue/text() instead.
What is the correct way to use contains() in an XQuery when it's used in the XML.value() method? Or is this the wrong approach to begin with?
You nearly have it right, you just need [1] on the text() function to guarantee a single value.
You should also use text() on the actual node you are pulling out, for performance reasons.
Also, // can be inefficient, so only use it if you really need recursive descent. You can instead use /*/ to get the first node of any name.
SELECT
Dials.DialDetail.value(
'(//Dial[DialName[contains(text()[1], "Bill")]]/DialValue/text())[1]',
'VARCHAR(64)') AS BillTo
FROM CampaignDials Dials
As Yitzhak Kabinsky notes, this only gets you one value per row of the table, you need .nodes if you want to shred the XML itself into rows.
The difference between your actual database case that fails and your reduced sample case that works is likely one of different data.
The error,
contains() requires a singleton (or empty sequence)
indicates that one of your DialName elements has multiple text node children rather than a single text node child as you're expecting.
You can abstract away such variations by testing the string-value of DialName rather than its text node children:
//Dial[contains(DialName, 'Bill')]/DialValue
See also
Testing text() nodes vs string values in XPath
Here is how to do XML shredding in MS SQL Server correctly.
You need to apply filter in the XQuery .nodes() method.
The .value() method is just for the actual value retrieval.
It is possible to pass SQL Server variable as a parameter instead of the hard-coding "Bill" value.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, DialDetail XML);
INSERT INTO #tbl (DialDetail) VALUES
(N'<Dial>
<DialID>24521</DialID>
<DialName>Base Price</DialName>
</Dial>
<Dial>
<DialID>24528</DialID>
<DialName>Rush Options</DialName>
<DialValue>1.5</DialValue>
</Dial>
<Dial>
<DialID>24530</DialID>
<DialName>Bill Rush Charges</DialName>
<DialValue>School</DialValue>
</Dial>');
-- DDL and sample data population, end
SELECT ID
, c.value('(DialID/text())[1]', 'INT') AS DialID
, c.value('(DialName/text())[1]', 'VARCHAR(30)') AS DialName
, c.value('(DialValue/text())[1]', 'VARCHAR(30)') AS DialValue
FROM #tbl CROSS APPLY DialDetail.nodes('/Dial[contains((DialName/text())[1], "Bill")]') AS t(c);
Output
+----+--------+-------------------+-----------+
| ID | DialID | DialName | DialValue |
+----+--------+-------------------+-----------+
| 1 | 24530 | Bill Rush Charges | School |
+----+--------+-------------------+-----------+

How to convert a CharField to a DateTimeField in peewee on the fly?

I have model I created on the fly for peewee. Something like this:
class TestTable(PeeweeBaseModel):
whencreated_dt = DateTimeField(null=True)
whenchanged = CharField(max_length=50, null=True)
I load data from a text file to a table using peewee, the column "whenchanged" contains all dates in a format of '%Y-%m-%d %H:%M:%S' as varchar column. Now I want to convert the text field "whenchanged" into a datetime format in "whencreated_dt".
I tried several things... I ended up with this:
# Initialize table to TestTable
to_execute = "table.update({table.%s : datetime.strptime(table.%s, '%%Y-%%m-%%d %%H:%%M:%%S')}).execute()" % ('whencreated_dt', 'whencreated')
which fails with a "TypeError: strptime() argument 1 must be str, not CharField": I'm trying to convert "whencreated" to datetime and then assign it to "whencreated_dt".
I tried a variation... following e.g. works without a hitch:
# Initialize table to TestTable
to_execute = "table.update({table.%s : datetime.now()}).execute()" % (self.name)
exec(to_execute)
But this is of course just the current datetime, and not another field.
Anyone knows a solution to this?
Edit... I did find a workaround eventually... but I'm still looking for a better solution... The workaround:
all_objects = table.select()
for o in all_objects:
datetime_str = getattr( o, 'whencreated' )
setattr(o, 'whencreated_dt', datetime.strptime(datetime_str, '%Y-%m-%d %H:%M:%S'))
o.save()
Loop over all rows in the table, get the "whencreated". Convert "whencreated" to a datetime, put it in "whencreated_dt", and save each row.
Regards,
Sven
Your example:
to_execute = "table.update({table.%s : datetime.strptime(table.%s, '%%Y-%%m-%%d %%H:%%M:%%S')}).execute()" % ('whencreated_dt', 'whencreated')
Will not work. Why? Because datetime.strptime is a Python function and operates in Python. An UPDATE query works in database-land. How the hell is the database going to magically pass row values into "datetime.strptime"? How would the db even know how to call such a function?
Instead you need to use a SQL function -- a function that is executed by the database. For example, Postgres:
TestTable.update(whencreated_dt=whenchanged.cast('timestamp')).execute()
This is the equivalent SQL:
UPDATE test_table SET whencreated_dt = CAST(whenchanged AS timestamp);
That should populate the column for you using the correct data type. For other databases, consult their manuals. Note that SQLite does not have a dedicated date/time data type, and the datetime functionality uses strings in the Y-m-d H:M:S format.

DBUnit insists on inserting null for unspecified values, but I want the DB default value to be used

I'm having this problem with DBUnit causing a SQL insert error. Say I have this in my dbunit testdata.xml file:
<myschema.mytable id="1" value1="blah" value2="foo" />
I have a table like this (postgres)
myschema.mytable has an id, value1, value2, and a date field, say "lastmodified." The lastmodified column is timestamp with modifiers "not null default now()"
It appears that dbunit reads the table metadata and attempts to insert nulls for any column that isn't specified in my testdata.xml file. So the above xml results in an insert like this:
insert into myschema.mytable (id,value1,value2,lastmodified) values (1,'blah','foo',null)
When running tests (dbunit/maven plugin) I get an error like this:
Error executing database operation: REFRESH: org.postgresql.util.PSQLException: ERROR: null value in column "lastmodified" violates not-null constraint
Is there some way to tell DBUnit to NOT INSERT null values on fields that I don't specify?
Edit: Using dbunit 2.5.3, junit 4.12, postgressql driver 9.4.1208
Use the dbUnit "exclude column" feature:
How to exclude some table columns at runtime?
The FilteredTableMetaData class introduced in DbUnit 2.1 can be used in combination with the IColumnFilter interface to decide the inclusion or exclusion of table columns at runtime.
FilteredTableMetaData metaData = new FilteredTableMetaData(originalTable.getTableMetaData(), new MyColumnFilter());
ITable filteredTable = new CompositeTable(metaData, originalTable);

Create XSD for defined XML and Table

I would like to build an import interface for a sql-server database which works with XML files. Now I got stuck in creating a XSD File to ensure a correct input-xml.
Let's say I have a table like this:
table: accounts
colummns: account_id INT NOT NULL
name VARCHAR(20) NOT NULL
type CHAR(1) NOT NULL
desc VARCHAR(100)
and the xml file should look like this:
<accounts>
<account>
<account_id>1</account_id>
<name>account A</name>
<type>B</type>
</account>
<account>
<account_id>2</account_id>
<name>account B</name>
<type>D</type>
<desc>some text here</desc>
</account>
</accounts>
It's the first time I am designing something like this and I have no experience with xsd file ...
I tried several things like SELECT .. FOR XML AUTO, XMLSCHEMA and XSD.exe but nothing gave me what I wanted.
I want to map the types of the SQL-Server Table in XSD - such like nullable/not nullable and length of strings. Even a range of valid values should be declared (e.g. type can only be A,B,C or D).
SELECT * FROM [accounts] FOR XML PATH('account'), ROOT('accounts')
found a solution with XSD.exe - Just created an XSD an then adopted it, but it's hard work if you're totally new to this topic ;-)

Resources