Flink : How to save list of rows in Database - apache-flink

Right now I am reading rows from a file and saving in database using the below code:
String strQuery = "INSERT INTO public.alarm (id, name, marks) VALUES (?, ?, ?)";
JDBCOutputFormat jdbcOutput = JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername("org.postgresql.Driver")
.setDBUrl("jdbc:postgresql://localhost:5432/postgres?user=michel&password=polnareff")
.setQuery(strQuery)
.setSqlTypes(new int[] { Types.INTEGER, Types.VARCHAR, Types.INTEGER}) //set the types
.finish();
DataStream<Row> rows = FilterStream
.map((tuple)-> {
Row row = new Row(3);
row.setField(0, tuple.f0);
row.setField(1, tuple.f1);
row.setField(2, tuple.f2);
return row;
});
rows.writeUsingOutputFormat(jdbcOutput);
env.execute();
}
}
The above is working fine and it picks rows one by one from a file and saves it in the database.
For example:
If the file contains:
1, mark, 20
then database entry will look like:
id name marks
------------------
1 mark 20
Now the requirement is for every row, I have to create 2 different rows and it should look like below:
For example:
If the file contains:
1, mark, 20
then database entry should look like this:
id name marks
------------------
1 mark-1 20
1 mark-2 20
Now I should return List instead of row and datastream variable should look like DataStream<List<Row>> rows.
What should I change in JDBCOutputFormat variable in order to achieve this?

Related

Dynamically build a SQL Insert statement based on results from a DataView

I have a legacy data logging industrial app that I'm writing a new interface for. The program lets you select points on devices, save those to a profile, then select devices to apply that profile for. When you apply the profile it create a table for each device using the devices unique ID as the table name and creates columns for each point of data you will be logging using the unique point ID. For example I select 3 points of information to datalog and it saves those three as a Profile (into it's own table) and then the point into the Points table tagged with that Profile:
PointID PointName ProfileID
33 Temp23 1
34 Hum14 1
35 Stat 1
I then select a couple devices and apply that profile which saves to the Device table:
DeviceID DeviceName ProfileID
5 NWUnit 1
6 NEUnit 1
After it saves the devices it creates the table per device such as:
Table Name: DEV5
Column 1: PNT1 - Float
Column 2: PNT2 - Float
Column 3: PNT3 - Bit
As you can see the table names are directly related to the device ID and the column names directly related to the point ID. I can add/remove points form the profile, it adds/deletes columns as needed. Apply a different profile and the DEV tables get deleted and recreated. Everything works as expected like the old program that's being replaced.
Now I need to actually do the data logging. I created a simple view:
SELECT dbo.Devices.DeviceID, dbo.Points.PointName, dbo.Points.PointID
FROM dbo.Devices LEFT OUTER JOIN
dbo.Points ON dbo.Devices.ProfileID = dbo.Points.ProfileID
Again so far so good:
DeviceID PointName PointID
5 Temp23 33
5 Hum14 34
5 Stat 35
6 Temp23 33
6 Hum14 34
6 Stat 35
I take this and I throw it in a DataTable, do a Columns.Add("Value") to it to get a blank column, then go through a data retrieval. When it's done I now have the table with the retrieved value:
DeviceID PointName PointID Value
5 Temp23 33 72.34
5 Hum14 34 43.8
5 Stat 35 1
6 Temp23 33 76.80
6 Hum14 34 54.2
6 Stat 35 0
And that's where I'm stuck. I need to take this info, use the DeviceID for the table name and the PointID for the column name, and insert the data. In otherwords I need this:
Dim myParamList As New Dictionary(Of String, Object) From {
{"#SampleTime", Date.Now},
{"#DevTable", "Dev" & r.Item("DeviceID")},
HOW DO I CYCLE THROUGH TO GET THE COLUMNS HERE?
}
UpdateDatabase(MySQLConnection, "INSERT INTO #DevTable (SampleTime, AND HERE?) VALUES (#SampleTime, AND HERE)", myParamList)
I cannot figure out the cycling through part. I thought I should use a Count + Group By to find out how many rows have the same device ID, like DeviceID 5 has 3 rows, and use that to cycle through that number of times but I'm just stuck trying to figure out how.
Any suggestions on the best way to do this?
So after struggling with trying to do a GroupBy on a dataview I decided to just do another database query with a Count(*) and GroupBy DeviceID to grab my unique DeviceIDs:
DeviceID RowCount
5 3
6 3
I then used that to loop through the device ID's and used the ID to filter myView as needed. Then I dynamically created a parameterized SQL string and update the database:
For Each r As DataRow In DevIDDataset.Tables("DeviceIDs").Rows
myView.RowFilter = "DeviceID=" & r.Item("DeviceID")
Dim myParamList As New Dictionary(Of String, Object) From {
{"#SampleTime", Date.Now}
}
Dim myFields As String = "SampleTime"
Dim myValues As String = "#SampleTime"
For Each row As DataRowView In myView
Dim myPointID As String = row.Item("PointID")
myFields += ",obj" & myPointID
myParamList.Add("#obj" & myPointID, row.Item("RetrievedValue"))
myValues += ",#obj" & myPointID
Next
UpdateDatabase(MySQLConnection, "INSERT INTO dev" & r.Item("DeviceID") & " (" & myFields & ") VALUES (" & myValues & ")", myParamList)
Next
Not pretty but it does what it needs to do and I can't think of any other way to do it.

Error while retrieving data from SQL Server using pyodbc python

My table data has 5 columns and 5288 rows. I am trying to read that data into a CSV file adding column names. The code for that looks like this :
cursor = conn.cursor()
cursor.execute('Select * FROM classic.dbo.sample3')
rows = cursor.fetchall()
print ("The data has been fetched")
dataframe = pd.DataFrame(rows, columns =['s_name', 't_tid','b_id', 'name', 'summary'])
dataframe.to_csv('data.csv', index = None)
The data looks like this
s_sname t_tid b_id name summary
---------------------------------------------------------------------------
db1 001 100 careie hello this is john speaking blah blah blah
It looks like above but has 5288 such rows.
When I try to execute my code mentioned above it throws an error saying :
ValueError: Shape of passed values is (5288, 1), indices imply (5288, 5)
I do not understand what wrong I am doing.
Use this.
dataframe = pd.read_sql('Select * FROM classic.dbo.sample3',con=conn)
dataframe.to_csv('data.csv', index = None)

Csv file to a Lua table and access the lines as new table or function()

Currently my code have simple tables containing the data needed for each object like this:
infantry = {class = "army", type = "human", power = 2}
cavalry = {class = "panzer", type = "motorized", power = 12}
battleship = {class = "navy", type = "motorized", power = 256}
I use the tables names as identifiers in various functions to have their values processed one by one as a function that is simply called to have access to the values.
Now I want to have this data stored in a spreadsheet (csv file) instead that looks something like this:
Name class type power
Infantry army human 2
Cavalry panzer motorized 12
Battleship navy motorized 256
The spreadsheet will not have more than 50 lines and I want to be able to increase columns in the future.
Tried a couple approaches from similar situation I found here but due to lacking skills I failed to access any values from the nested table. I think this is because I don't fully understand how the tables structure are after reading each line from the csv file to the table and therefore fail to print any values at all.
If there is a way to get the name,class,type,power from the table and use that line just as my old simple tables, I would appreciate having a educational example presented. Another approach could be to declare new tables from the csv that behaves exactly like my old simple tables, line by line from the csv file. I don't know if this is doable.
Using Lua 5.1
You can read the csv file in as a string . i will use a multi line string here to represent the csv.
gmatch with pattern [^\n]+ will return each row of the csv.
gmatch with pattern [^,]+ will return the value of each column from our given row.
if more rows or columns are added or if the columns are moved around we will still reliably convert then information as long as the first row has the header information.
The only column that can not move is the first one the Name column if that is moved it will change the key used to store the row in to the table.
Using gmatch and 2 patterns, [^,]+ and [^\n]+, you can separate the string into each row and column of the csv. Comments in the following code:
local csv = [[
Name,class,type,power
Infantry,army,human,2
Cavalry,panzer,motorized,12
Battleship,navy,motorized,256
]]
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
Here is how you can load a csv using the I/O library:
-- Example of how to load the csv.
path = "some\\path\\to\\file.csv"
local f = assert(io.open(path))
local csv = f:read("*all")
f:close()
Alternative you can use io.lines(path) which would take the place of csv:gmatch("[^\n]+") in the for loop sections as well.
Here is an example of using the resulting table:
-- print table out
print("items = {")
for name, item in pairs(items) do
print(" " .. name .. " = { ")
for field, value in pairs(item) do
print(" " .. field .. " = ".. value .. ",")
end
print(" },")
end
print("}")
The output:
items = {
Infantry = {
type = human,
class = army,
power = 2,
},
Battleship = {
type = motorized,
class = navy,
power = 256,
},
Cavalry = {
type = motorized,
class = panzer,
power = 12,
},
}

Combine 2 rows of CSV file in SSIS

I have one CSV file where the information is spread on two lines
Line 1 contains Name and age
Line 2 contains detail like address, city, salary, occupation
I want to combine 2 rows to insert it in a database.
CSV file :
Raju, 42
12345 west andheri,Mumbai, 100000, service
In SQL Server I can do by using cursor. But I have to do in SSIS.
For a similar case, i will read each line as one column and use a script component to fix the structure. You can follow my answer on the following question. It contains a step-by-step guide:
SSIS reading LF as terminator when its set as CRLF
I like using a script component in order to be able to store data from a different row in this case.
Read the file as a single column CSV into Column1.
Add script component and add a new Output called CorrectedOutput and define all columns from both rows. Also, mark Column1 as read.
Create 2 variables outside of row processing to 'hold' first row
string name = string.Empty;
string Age = string.Empty;
Use a split to determine line 1 or line 2
string[] str = Row.Column1.Split(',');
Use an if to determine row 1 or 2
if(str.Length == 2)
{
name = str[0];
age=str[1];}
else
{
CorrectedOutputBuffer.AddRow();
CorrectedOutputBuffer.Name = name; //This uses the stored value from prior row
CorrectedOutputBuffer.Age = age; //This uses the stored value from prior row
CorrectedOutputBuffer.Address = str[0];
CorrectedOutputBuffer.City = str[1];
CorrectedOutputBuffer.Salary = str[2];
CorrectedOutputBuffer.Occupation = str[3];
}
The overall effect is this...
On Row 1, you just hold the data in variables
On Row 2, you write out the data to 1 new row.

Trouble adding elements to a table (array in Lua)

I am attempting to create a table to serve as a small database for users:
users = {}
function create_new_user()
print("Enter a unique user name (up to 12 letters): ")
local name = io.read()
if #name > 12 then
print ("That name is too long.")
return create_new_user()
elseif users[name] then
print ("That name is already in use.")
return create_new_user()
else
table.insert(users, 1, name)
print("Your new user name is: ", users[name])
end
end
I understood from the manual that the line
table.insert(users, 1, name)
would insert the string value of name as an element of the users array. This is not the case-- whenever I run the script I get the following output:
Your new user name is: nil
You insert the element into the table, but you are trying to retrieve the value indexed by the value of name, which is not what you stored (you are using users[name] instead of users[1]). You can probably do something like this:
table.insert(users, name)
print("Your new user name is: ", name)
Note that table.insert(users, 1, name) may not do what you expect as this will prepend elements to the table. If you insert "abc" and "def" this way, then the users table will include elements {"def", "abc"} (in this particular order). To retrieve the last inserted element you can use users[1].
If you want to store values in a different order, you need to use table.insert(users, name), which will append elements to the table. To retrieve the last element you can use users[#users].
If you always want to store the added element in the first position in the table, then you can simply use users[1] = name.
Here you index the user table with a string (the name):
elseif users[name] then
You do the same here:
print("Your new user name is: ", users[name])
But you store the name with a numerical index:
table.insert(users, 1, name)
What you want instead of that line is:
users[name] = name
Or this (which would require changing the line that follows):
users[name] = true
The idea is you're only really using the keys, to create a lookup table.

Resources