How to import XML to SQL Server when base nodes repeat

How to import XML to SQL Server when base nodes repeat - sql-server

I am importing generated XML files into a SQL Server database. It's been working fine, but now I need additional information from a node that repeats, and I can't figure it out. I am using VBScript with SQLXMLBulkLoad.SQLXMLBulkload.4.0 to import the files. There are many <file> nodes in each file, and here a simplified as an example:
<?xml version="1.0" encoding="UTF-8"?>
<samp_catalog version='1.0'>
<file>
<name>123.wav</name>
<xyz>
<a_samp>
<a_one>1</a_one>
<a_two>2</a_two>
<a_three>3</a_three>
</a_samp>
<b_samp version='2.0'>
<b_four>sample</b_four>
<b_five>sample</b_five>
<b_six>sample</b_six>
</b_samp>
<c_samp version='1.0'>
<c_entry>
<total_size>0</total_size>
<sig>sample</sig>
<type>SAMPLE</type>
<data_size>256</data_size>
<data encoding='string'></data>
</c_entry>
</c_samp>
<c_samp version='1.0'>
<c_entry>
<total_size>0</total_size>
<sig>sample</sig>
<type>PHONE</type>
<data_size>256</data_size>
<data encoding='string'>15555551212</data>
</c_entry>
</c_samp>
<c_samp version='1.0'>
<c_entry>
<total_size>0</total_size>
<sig>sample</sig>
<type>OTHER_SAMPLE</type>
<data_size>256</data_size>
<data encoding='string'></data>
</c_entry>
</c_samp>
</xyz>
</file>
</samp_catalog>
Without the repeating c_samp node, the schema was easy:
<xsd:schema xmlns:xsd = "http://www.w3.org/2001/XMLSchema" xmlns:sql = "urn:schemas-microsoft-com:mapping-schema">
<xsd:element name = "samp_catalog" sql:is-constant = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "file" sql:relation = "files" maxOccurs = "unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "name" type = "xsd:string" sql:field = "name" />
<xsd:element name = "xyz" sql:is-constant = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "a_samp" sql:is-constant = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "a_one" type = "xsd:integer" sql:field = "a_one" />
<xsd:element name = "a_two" type = "xsd:integer" sql:field = "a_two" />
<xsd:element name = "a_three" type = "xsd:integer" sql:field = "a_three" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name = "b_samp" sql:is-constant = "1">
<xsd:complexType>
<xsd:sequence>
<xsd:element name = "b_four" type = "xsd:string" sql:field = "b_four" />
<xsd:element name = "b_five" type = "xsd:string" sql:field = "b_five" />
<xsd:element name = "b_six" type = "xsd:string" sql:field = "b_six" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
The important one is the second data element (of type PHONE), but importing them all is acceptable as well. I have tried defining all 3 c_samps in the schema and defining a sql:field for only the specified data element and defining it for all elements, however the import always results in NULL for everything under a c_samp element.
How can I get the repeating nodes to import into the database?

I would write a stored procedure in SQL-Server and pass over the XML as-is. Use SQL Server's abilities to read the XML without a schema.
The following code will extract all information from the XML given.
Just do something like
CREATE dbo.ImportMyXML(#xml XML)
AS
BEGIN
--Code here
END
GO
Within the BEGIN END you write this query (out-commented the INSERT INTO to test the query's result:
--INSERT INTO YourTargetTable(col1,col2,col3,...)
SELECT #xml.value(N'(/samp_catalog/file/name/text())[1]',N'nvarchar(max)') AS name
,a_samp.value(N'(*/text())[1]',N'nvarchar(max)') AS a_one
,a_samp.value(N'(*/text())[2]',N'nvarchar(max)') AS a_two
,a_samp.value(N'(*/text())[3]',N'nvarchar(max)') AS a_three
,b_samp.value(N'#version',N'nvarchar(max)') AS b_version
,b_samp.value(N'(*/text())[1]',N'nvarchar(max)') AS b_one
,b_samp.value(N'(*/text())[2]',N'nvarchar(max)') AS b_two
,b_samp.value(N'(*/text())[3]',N'nvarchar(max)') AS b_three
,NodesWith_c_entry.value(N'(../#version)[1]',N'nvarchar(max)') AS c_version
,NodesWith_c_entry.value(N'(total_size/text())[1]','int') AS c_total_size
,NodesWith_c_entry.value(N'(sig/text())[1]','nvarchar(max)') AS c_sig
,NodesWith_c_entry.value(N'(type/text())[1]','nvarchar(max)') AS c_type
,NodesWith_c_entry.value(N'(data_size/text())[1]','int') AS c_data_size
,NodesWith_c_entry.value(N'(data/#encoding)[1]','nvarchar(max)') AS c_data_encoding
,NodesWith_c_entry.value(N'(data/text())[1]','nvarchar(max)') AS c_data_text
FROM #xml.nodes(N'samp_catalog/file/xyz') AS A(xyz)
OUTER APPLY xyz.nodes('a_samp') AS B(a_samp)
OUTER APPLY xyz.nodes('b_samp') AS C(b_samp)
OUTER APPLY xyZ.nodes('*/c_entry') AS D(NodesWith_c_entry);
With the XML given the result is this:
+---------+-------+-------+---------+-----------+--------+--------+---------+-----------+--------------+--------+--------------+-------------+-----------------+-------------+
| name | a_one | a_two | a_three | b_version | b_one | b_two | b_three | c_version | c_total_size | c_sig | c_type | c_data_size | c_data_encoding | c_data_text |
+---------+-------+-------+---------+-----------+--------+--------+---------+-----------+--------------+--------+--------------+-------------+-----------------+-------------+
| 123.wav | 1 | 2 | 3 | 2.0 | sample | sample | sample | 1.0 | 0 | sample | SAMPLE | 256 | string | NULL |
+---------+-------+-------+---------+-----------+--------+--------+---------+-----------+--------------+--------+--------------+-------------+-----------------+-------------+
| 123.wav | 1 | 2 | 3 | 2.0 | sample | sample | sample | 1.0 | 0 | sample | PHONE | 256 | string | 15555551212 |
+---------+-------+-------+---------+-----------+--------+--------+---------+-----------+--------------+--------+--------------+-------------+-----------------+-------------+
| 123.wav | 1 | 2 | 3 | 2.0 | sample | sample | sample | 1.0 | 0 | sample | OTHER_SAMPLE | 256 | string | NULL |
+---------+-------+-------+---------+-----------+--------+--------+---------+-----------+--------------+--------+--------------+-------------+-----------------+-------------+

So far, I haven't found an answer for what I was looking to do, but here is a work around I'm using for now. In VBScript, after loading the file with Microsoft.XMLDOM I loop through the file and find all type nodes whose text value is PHONE and then create a new element and append it to b_samp with its sibling data's text value.
VBScript:
Set nodes = xmlFile.selectNodes("//type")
For Each node in nodes
If node.text = "PHONE" Then
Set phoneNode = xmlFile.CreateElement("phone")
For Each child in node.ParentNode.ChildNodes
If child.nodeName = "data" Then phoneNode.text = child.text
Next
For Each child in node.ParentNode.ParentNode.ParentNode.ChildNodes
If child.nodeName = "b_samp" Then child.appendChild phoneNode
Next
End If
Next
Then save the file with a new name to leave the original intact, and import it with a new schema adding this line right after the b_six element:
<xsd:element name = "phone" type = "xsd:string" sql:field = "phone" />

Related

flink table API not processing records

I read json data from Kafka and tried to process the data with flink table API.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
tEnv.executeSql(
"create table inputTable(" +
"`src_ip` STRING," +
"`src_port` STRING," +
"`bytes_from_src` BIGINT," +
"`pkts_from_src` BIGINT," +
"`ts` TIMESTAMP(2) METADATA FROM 'timestamp'," +
"WATERMARK FOR ts AS ts" +
") WITH (" +
"'connector' = 'kafka'," +
"'topic' = 'test'," +
"'properties.bootstrap.servers' = 'localhost:9092'," +
"'properties.group.id' = 'testGroup'," +
"'scan.startup.mode' = 'earliest-offset'," +
"'format' = 'json'," +
"'json.fail-on-missing-field' = 'true'," +
"'json.ignore-parse-errors' = 'false'" +
")");
Table inputTable = tEnv.from("inputTable");
inputTable.printSchema();
inputTable.execute().print();
Table windowedTable = inputTable
.window(Tumble.over(lit(5).seconds()).on($("ts")).as("w"))
.groupBy($("w"), $("src_ip"))
.select($("w").start().as("window_start"),
$("src_ip"),
$("src_ip").count().as("src_ip_count"),
$("bytes_from_src").avg().as("bytes_from_src_mean")
);
windowedTable.execute().print();
There are 4 records in Kafka. The flink program prints out the schema info and the inputTable as the following:
Connected to the target VM, address: '127.0.0.1:62348', transport: 'socket'
(
`src_ip` STRING,
`src_port` STRING,
`bytes_from_src` BIGINT,
`pkts_from_src` BIGINT,
`ts` TIMESTAMP(2) *ROWTIME* METADATA FROM 'timestamp',
WATERMARK FOR `ts`: TIMESTAMP(2) AS `ts`
)
+----+--------------------------------+--------------------------------+----------------------+----------------------+-------------------------+
| op | src_ip | src_port | bytes_from_src | pkts_from_src | ts |
+----+--------------------------------+--------------------------------+----------------------+----------------------+-------------------------+
| +I | 44.38.5.31 | 53159 | 120 | 3 | 2021-08-13 14:59:56.00 |
| +I | 44.38.132.51 | 39409 | 100 | 2 | 2021-08-13 14:58:11.00 |
| +I | 44.38.4.44 | 56758 | 336 | 6 | 2021-08-13 14:59:14.00 |
| +I | 44.38.5.34 | 40001 | 80 | 2 | 2021-08-13 14:57:04.00 |
After that, nothing is printed out. The program did not exit. I am running the flink within IDEA. At this point, it seems like a black box. There is no output, and I do not know how to trace a flink program.
If I commented out the line inputTable.execute().print();, the schema info is printed out, but nothing after that and the program does not exit.
The flink version used is 1.14.2.

I believe those records are being processed, and are being added to the window. But event time windows are triggered by watermarks, and the watermark isn't becoming large enough to trigger the window. To get this to work you need to process an event with a timestamp past the end of the window -- i.e., 2021-08-13 15:00:00.00 or larger.
For debugging, the Flink web dashboard is helpful in situations like this. You can see if events are being processed, examine the watermarks, etc. See Flink webui when running from IDE for help in setting it up.

Solana web3 Program constructor is expecting a json as one of params

I am working with typescript react and I need help with how to fix the issue. My constructor object is expecting one of the arguments to be Idl type which is basically a json generated from solana. How do i fix this?

yeah there is a weird thing with TypeScript and the IdlType on args if you look into the IDL object representation.
It is related to this line:
export declare type IdlType = "bool" | "u8" | "i8" | "u16" | "i16" | "u32" | "i32" | "f32" | "u64" | "i64" | "f64" | "u128" | "i128" | "bytes" | "string" | "publicKey" | IdlTypeDefined | IdlTypeOption | IdlTypeCOption | IdlTypeVec | IdlTypeArray;
The way fixed it is by using a workaround:
import YOUR_IDL_JSON_OBJECT from '../config/abiSolana/solanaIDL.json'
const a = JSON.stringify(YOUR_IDL_JSON_OBJECT)
const b = JSON.parse(a)
return new Program(b, address, provider)
When you do this the compiler should not scream at you. If someone cares to explain what the hell is wrong with the enum there, I would be happy. :)

convert JSON text string to Pandas, but each row cell ends up as an array of values inside

I manage to extract a time-series of prices from a web-portal. The data arrives in a json format, and I convert them into a pandas dataFrame.
Unfortunately, the data for the different bands come in a text string, and I can't seem to extract them out properly.
The below is the json data I extract
I convert them into a pandas dataframe using this code
data = pd.DataFrame(r.json()['prices'])
and get them like this
I need to extract (for example) the data in the column ClosePrice out, so that I can do data analysis and cleansing on them.
I tried using
data['closePrice'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
but it doesn't really work.
Is there any way to either
a) when I convert the json to dataFrame, such that the prices within the closePrice, bidPrice etc are extracted in individual columns OR
b) if they were saved in the dataFrame, extract the text strings within them, such that I can extract the prices (e.g. the bid, ask and lastTraded) within the text string?

A relatively brute force way, using links from other stackOverflow.
# load and extract the json data
s = requests.Session()
r = s.post(url + '/session', json=data)
loc = <url>
dat1 = s.get(loc)
dat1 = pd.DataFrame(dat1.json()['prices'])
# convert the object list into individual columns
dat2 = pd.DataFrame()
dat2[['bidC','askC', 'lastP']] = pd.DataFrame(dat1.closePrice.values.tolist(), index= dat1.index)
dat2[['bidH','askH', 'lastH']] = pd.DataFrame(dat1.highPrice.values.tolist(), index= dat1.index)
dat2[['bidL','askL', 'lastL']] = pd.DataFrame(dat1.lowPrice.values.tolist(), index= dat1.index)
dat2[['bidO','askO', 'lastO']] = pd.DataFrame(dat1.openPrice.values.tolist(), index= dat1.index)
dat2['tStamp'] = pd.to_datetime(dat1.snapshotTime)
dat2['volume'] = dat1.lastTradedVolume
get the equivalent below

Use pandas.json_normalize to extract the data from the dict
import pandas as pd
data = r.json()
# print(data)
{'prices': [{'closePrice': {'ask': 1.16042, 'bid': 1.16027, 'lastTraded': None},
'highPrice': {'ask': 1.16052, 'bid': 1.16041, 'lastTraded': None},
'lastTradedVolume': 74,
'lowPrice': {'ask': 1.16038, 'bid': 1.16026, 'lastTraded': None},
'openPrice': {'ask': 1.16044, 'bid': 1.16038, 'lastTraded': None},
'snapshotTime': '2018/09/28 21:49:00',
'snapshotTimeUTC': '2018-09-28T20:49:00'}]}
df = pd.json_normalize(data['prices'])
Output:
| | lastTradedVolume | snapshotTime | snapshotTimeUTC | closePrice.ask | closePrice.bid | closePrice.lastTraded | highPrice.ask | highPrice.bid | highPrice.lastTraded | lowPrice.ask | lowPrice.bid | lowPrice.lastTraded | openPrice.ask | openPrice.bid | openPrice.lastTraded |
|---:|-------------------:|:--------------------|:--------------------|-----------------:|-----------------:|:------------------------|----------------:|----------------:|:-----------------------|---------------:|---------------:|:----------------------|----------------:|----------------:|:-----------------------|
| 0 | 74 | 2018/09/28 21:49:00 | 2018-09-28T20:49:00 | 1.16042 | 1.16027 | | 1.16052 | 1.16041 | | 1.16038 | 1.16026 | | 1.16044 | 1.16038 | |

Count unique rows of column in Array of object

Using
Ruby 1.9.3-p194
Rails 3.2.8
Here's what I need.
Count the different human resources (human_resource_id) and divide this by the total number of assignments (assignment_id).
So, the answer for the dummy-data as given below should be:
1.5 assignments per human resource
But I just don't know where to go anymore.
Here's what I tried:
Table name: Assignments
id | human_resource_id | assignment_id | assignment_start_date | assignment_expected_end_date
80101780 | 20200132 | 80101780 | 2012-10-25 | 2012-10-31
80101300 | 20200132 | 80101300 | 2012-07-07 | 2012-07-31
80101308 | 21100066 | 80101308 | 2012-07-09 | 2012-07-17
At first I need to make a selection for the period I need to 'look' at. This is always from max a year ago.
a = Assignment.find(:all, :conditions => { :assignment_expected_end_date => (DateTime.now - 1.year)..DateTimenow })
=> [
#<Assignment id: 80101780, human_resource_id: "20200132", assignment_id: "80101780", assignment_start_date: "2012-10-25", assignment_expected_end_date: "2012-10-31">,
#<Assignment id: 80101300, human_resource_id: "20200132", assignment_id: "80101300", assignment_start_date: "2012-07-07", assignment_expected_end_date: "2012-07-31">,
#<Assignment id: 80101308, human_resource_id: "21100066", assignment_id: "80101308", assignment_start_date: "2012-07-09", assignment_expected_end_date: "2012-07-17">
]
foo = a.group_by(&:human_resource_id)
Now I got a beautiful 'Array of hash of object' and I just don't know what to do next.
Can someone help me?

You can try to execute the request in SQL :
ActiveRecord::Base.connection.select_value('SELECT count(distinct human_resource_id) / count(distinct assignment_id) AS ratio FROM assignments');

You could do something like
human_resource_count = assignments.collect{|a| a.human_resource_id}.uniq.count
assignment_count = assignments.collect{|a| a.assignment_id}.uniq.count
result = human_resource_count/assignment_count

retrieving multiple rows from database and display using struts2

I am developing a photo album web application in which I am using two tables USER_INFORMATION and USER_PHOTO. USER_INFORMATION contains only user record which is one record, USER_PHOTO contains more than one photo form a single user. I want to get the number of user information along with their userimage path and store it in to a Java Pojo variable like a list so that I can display it using display tag in struts 2.
The table goes like this.
USER_PHOTO USER_INFORMATION
======================= =================================
| IMAGEPATH | USER_ID | | USER_NAME | AGE | ADDRESS |ID |
======================= =================================
|xyz | 1 | | abs | 34 | sdas | 1 |
|sdas | 1 | | asddd | 22 | asda | 2 |
|qwewq | 2 | | sadl | 121 | asd | 3 |
| asaa | 1 | ==================================
| 121 | 3 |
=======================
i some what manage to get the user information correctly by using HashSet and display tag the code goes like this...
public ArrayList loadData() throws ClassNotFoundException, SQLException {
ArrayList<UserData> userList = new ArrayList<UserData>();
try {
String name;
String fatherName;
int Id;
int age;
String address;
String query = "SELECT NAME,FATHERNAME,AGE,ADDRESS,ID FROM USER_INFORMATION ,USER_PHOTO WHERE ID=USER_ID";
ps = con.prepareStatement(query);
ResultSet rs = ps.executeQuery();
while (rs.next()) {
name = rs.getString(1);
fatherName = rs.getString(2);
age = rs.getInt(3);
address = rs.getString(4);
//Id = rs.getInt(5);
UserData list = new UserData();
list.setName(name);
list.setFatherName(fatherName);
list.setAge(age);
list.setAddress(address);
//list.setPhot(Id);
userList.add(list);
}
ps.close();
con.close();
} catch (Exception e) {
e.printStackTrace();
}
ArrayList al = new ArrayList();
HashSet hs = new HashSet();
hs.addAll(userList);
al.clear();
al.addAll(hs);
//return (userList);
return al;
and my display tag goes like this in jsp...
<display:table id="data" name="sessionScope.UserForm.userList" requestURI="/userAction.do" pagesize="15" >
<display:column property="name" title="NAME" sortable="true" />
<display:column property="fatherName" title="User Name" sortable="true" />
<display:column property="age" title="AGE" sortable="true" />
<display:column property="address" title="ADDRESS" sortable="true" />
</display:table>
as u can see i am displaying only user information not his photo uploaded i want to display his uploaded filename also. i some what created another table to store his photo information. if i include Imagepath attribute in my sql query than their will be duplicate rows means if user have uploaded 5 photos than display tag will display five records with same information with different ImagePath.
so any can tell me hoe make it to display only one record of user information with his multiple ImagePath so that it not display user information repeatedly.
Thanks In Advance

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to import XML to SQL Server when base nodes repeat - sql-server

Related

flink table API not processing records

Solana web3 Program constructor is expecting a json as one of params

convert JSON text string to Pandas, but each row cell ends up as an array of values inside

Count unique rows of column in Array of object

retrieving multiple rows from database and display using struts2

Categories

Resources