Related
I'm working through a stored procedure and wondering if there's a way to retrieve the anticipated result column list from a sql statement before fully executing.
Scenarios:
dynamic SQL
a UDF that might vary the columns outside of our control
EX:
//inbound parameter
SET QUERY_DEFINITION_ID = 12345;
//Initial statement pulls query text from bank of queries
var sqlText = getQueryFromQueryBank(QUERY_DEFINITION_ID);
//now we run our query
var cmd = {sqlText: sqlText };
stmt = snowflake.createStatement(cmd);
What I'd like to be able to do is say "right - before you run this, give me the anticipated column list" so I can compare it to what's expected.
EX:
Expected: [col1, col2, col3, col4]
Got: [col1]
Result: Oops. Don't run.
Rationale here is that I want to short-circuit the execution if something is missing - before it potentially runs for a while. I can validate all of this after the fact, but it would be really helpful to stop early.
Any ideas very much appreciated!
This sample SP code shows how to get a list of columns that a query will project into the result before you run the query. It should only be used for large, long running queries because it will take a few seconds to get the column list.
There are a couple of caveats. 1) It will only return the names of the columns. It won't tell you how they were built, that is, whether they're aliased, direct from a table, calculated, etc. 2) The example query I used is straight from the Snowflake documentation here https://docs.snowflake.com/en/user-guide/sample-data-tpcds.html#functional-query-definition. For convenience, I minimized the query to a single line. The output of the columns includes object qualifiers in addition to the column names, so V1.I_CATEGORY, V1.D_YEAR, V1.D_MOY, etc. If you don't want them to make it easier to compare names, you can strip off the qualifiers using the JavaScript split function on the dot and take index 1 of the resulting array.
create or replace procedure EXPLAIN_BEFORE_RUNNING()
returns string
language javascript
execute as caller
as
$$
// Set the context for the session to the TPC-H sample data:
executeNonQuery("use schema snowflake_sample_data.tpcds_sf10tcl;");
// Here's a complex query from the Snowflake docs (minimized to one line for convienience):
var sql = `with v1 as( select i_category, i_brand, cc_name, d_year, d_moy, sum(cs_sales_price) sum_sales, avg(sum(cs_sales_price)) over(partition by i_category, i_brand, cc_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, cc_name order by d_year, d_moy) rn from item, catalog_sales, date_dim, call_center where cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and cc_call_center_sk= cs_call_center_sk and ( d_year = 1999 or ( d_year = 1999-1 and d_moy =12) or ( d_year = 1999+1 and d_moy =1)) group by i_category, i_brand, cc_name , d_year, d_moy), v2 as( select v1.i_category ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1.cc_name = v1_lag.cc_name and v1.cc_name = v1_lead.cc_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 1999 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 limit 100;`;
// Before actually running the query, generate an explain plan.
executeNonQuery("explain " + sql);
// Now read the column list from the explain plan from the result set.
var columnList = executeSingleValueQuery("COLUMN_LIST", `select "expressions" as COLUMN_LIST from table(result_scan(last_query_id())) where "operation" = 'Result';`);
// For now, just exit with the column list as the output...
return columnList;
// Your code here...
// Helper functions:
function executeNonQuery(queryString) {
var out = '';
cmd = {sqlText: queryString};
stmt = snowflake.createStatement(cmd);
var rs;
rs = stmt.execute();
}
function executeSingleValueQuery(columnName, queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(columnName);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
$$;
call Explain_Before_Running();
I have these csv files on hand that I have to upload to a remote database and I've used pyodbc with the csv library in python to do it.I don't know why but it's insanely slow(about 30 seconds for a 100 rows) and some of these csv files I have to upload have over 30k rows. I've tried using pandas as well but there was no change in speed.
This is more or less my code.Unnecessary parts have been omitted.
if len(sys.argv) == 1:
print("This program needs needs an input state")
exit()
state_code = str(sys.argv[1])
f = open(state_code+".csv", "r")
reader = csv.reader(f, delimiter=',')
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
insert_query = '''INSERT INTO table (Zipcode, Pers_Property_Coverage, Deductible,
Liability,Average_Rate,
Highest_Rate, Lowest_Rate, CREATE_DATE, Active_Flag)
VALUES(?,?,?,?,?,?,?,?,?)'''
for row in reader:
zipcode = row[0]
if len(zipcode) == 4:
zipcode = "0" + zipcode
ppc=row[1][1:]
ppc=ppc.replace(',', '')
deductible = row[2][1:]
deductible = deductible.replace(',', '')
liability = row[3][1:]
liability = liability.replace(',', '')
average_rate = row[4][1:]
average_rate = average_rate.replace(',', '')
highest_rate = row[5][1:]
highest_rate = highest_rate.replace(',', '')
lowest_rate=row[6][1:]
lowest_rate = lowest_rate.replace(',', '')
ctr=ctr+1
if ctr % 100 == 0:
print("Time Elapsed = ", round(time.time() - start_time)," seconds")
values = (zipcode, ppc, deductible, liability, average_rate, highest_rate, lowest_rate, date, "Y")
print("Inserting "+zipcode ,ppc , deductible, liability, average_rate, highest_rate, lowest_rate,date, "Y")
cursor.execute(insert_query, values)
cnxn.commit()
Updating your code to use pyodbc executemany with option fast_executemany=True could be an easy way to save time:
https://github.com/mkleehammer/pyodbc/wiki/Cursor#executemanysql-params-with-fast_executemanytrue
Exploring bulk insert from your file could be another option, although it would most likely not use pyodbc or python:
https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15
I'm not sure why I can't test my function. My desired output is ID, then Room, but if there are multiple rooms for the same ID, then put it in a new row, like
ID Room
1 SW128 SW 143
into
ID Room
1 SW128
1 SW143
This is some of the data in the file.
1,SW128,SW143
2,SW309
3,AA205
4,AA112,SY110
5,AC223
6,AA112,AA206
but I can't even test my function. Can anyone please help me fix this?
def create_location_table(db, loc_file):
'''Location table has format ID, Room'''
con = sqlite3.connect(db)
cur = con. cursor()
cur.execute('''DROP TABLE IF EXISTS Locations''')
# create the table
cur.execute('''CREATE TABLE Locations (id TEXT, Room TEXT)''')
# Add the rows
loc_file = open('locations.csv', 'r')
loc_file.readline()
for line in loc_file:
d = {}
data = line.split(',')
ID = data[0]
Room = data[1:]
for (ID, Room) in d.items():
if Room not in d:
d[ID] = [Room]
for i in Rooms:
cur.execute(''' INSERT INTO Locations VALUES(?, ?)''', (ID,
Room))
# commit and close cursor and connection
con.commit()
cur.close()
con.close()
The problem is, that d is always an empty dict, so the for (ID, Room) in d.items() won't do anything. What you need to do is looping over Room. And you don't need the d dict.
def create_location_table(db, loc_file):
'''Location table has format ID, Room'''
con = sqlite3.connect(db)
cur = con. cursor()
cur.execute('''DROP TABLE IF EXISTS Locations''')
# create the table
cur.execute('''CREATE TABLE Locations (id TEXT, Room TEXT)''')
# open the CSV
csv_content = open(loc_file, 'r')
for line in csv_content:
data = line.strip().split(',')
# made lowercase following PEP8, but 'id' is a built in name in python
idx = data[0]
rooms = data[1:]
# loop through the rooms of this line and insert one row per room
for room in rooms:
cur.execute(''' INSERT INTO Locations VALUES(?, ?)''', (idx, room))
# for debug purposes only
print('INSERT INTO Locations VALUES(%s, %s)' % (idx, room))
# commit and close cursor and connection
con.commit()
cur.close()
con.close()
# call the method
create_location_table('db.sqlite3', 'locations.csv')
Note: Following PEP 8 I made your variables lowercase.
EDIT: post full code example, use loc_file parameter
I have a web poll application that reads and processes orders from sql server back end and inserts into windss(jda).
the flow
1) I read all the orders from the sql server tables that is not processed
2) then I need to divide the orders into two sets
- set 1 orders with bundle kit( 1 order with several suborders)
- set 2 is just orders with no bundle kit
3) then after each order is processed I need to update the isprocessed sql field to 1
1) code for reading all orders
Frmmain.vb
dt = WebDatabase.GetAllOrdersFromDatabase()**'this method is below**
For Each drow In dt.Rows
If CDbl(WinDSSStoreNumber) = 123 Or CDbl(WinDSSStoreNumber) = 124 Then
' Store 123 and 124 = customer service - only orders with qualifying source codes
my question is:
the above for each checks if the store is 123 or 124 but now I want to implement another for each loop that reads all the InvoiceHeader_Id that the second field in the datatable and then check if it has a bundle kit or not if it does then process it and update the isprocessed field second last in the datatable else process the orders without any bundle kit and update isprocessed for those as well. please any help is generously appreciated, please ask me any questions in the below comment before marking my question down.
WebData.vb
Public Function GetAllOrdersFromDatabase() As DataTable
DrivePath = "C:\Users\somjething\Documents\files\somefiles\"
Dim ds As DataTable = Nothing
WinDSS.connectionString = ConfigurationManager.AppSettings("WinDSS_Connection").Replace("%DrivePath%", DrivePath)
ds = WinDSS.GetSysMst()
If ds.Rows.Count > 0 Then
WinDSSStoreNumber = ds.Rows(0)("store_no")
End If
Dim dt As New DataTable()
Dim conn As New SqlConnection(ConfigurationManager.AppSettings("WebData_Connection") & ConfigurationManager.AppSettings("WebDataSource"))
Dim cmd As SqlCommand
Dim da As New SqlDataAdapter
conn.Open()
cmd = conn.CreateCommand()
cmd.CommandType = CommandType.Text
cmd.CommandText= SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader
WHERE (isprocessed = 0) AND (storetoprocess = N'123')
da.SelectCommand = cmd
da.Fill(dt)
conn.Close()
Return dt
End Function
Additoinal info
This is the information given to me by my manager:
loop throught each below
SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader WHERE (isprocessed = 0) AND (storetoprocess = N'195')
Use the above InvoiceHeader_Id to loop throught all of the order information. Process each Bundle (Kit/)
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo, LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty, OriginalOrderedQty, OtherCharges,
OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty, ReqCancelDate,
ReqDeliveryDate,
ReqShipDate, SCAC, ScacAndService, ScacAndServiceKey, ShipNode, ShipToID, ShipToKey,
StatusQuantity, SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent = 'Y')
ORDER BY PrimeLineNo, SubLineNo
For each record in the above list query below and add as Zero (0) dollar amount. The cost is associated with the information above record (you may have to query tables to ge the values)
Get the OrderLineKey from the above record and query this below to get the associated sub items.
SELECT TOP (100) PERCENT dbo.BundleParentLine.InvoiceHeader_Id AS BPL_InvoiceHeader_Id,
dbo.BundleParentLine.LineDetails_Id AS BPL_LineDetails_Id, dbo.BundleParentLine.LineDetail_Id AS BPL_LineDetail_Id, dbo.BundleParentLine.OrderLine_Id AS BPL_OrderLine_Id, dbo.BundleParentLine.BundleParentLine_id, dbo.BundleParentLine.SubLineNo AS BPL_SubLineNo, dbo.BundleParentLine.PrimeLineNo AS BPL_PrimeLineNo, dbo.BundleParentLine.OrderLineKey AS BPL_OrderLineKey, dbo.OrderLine.*
FROM dbo.BundleParentLine INNER JOIN dbo.OrderLine ON dbo.BundleParentLine.OrderLine_Id = dbo.OrderLine.OrderLine_Id AND dbo.BundleParentLine.LineDetail_Id = dbo.OrderLine.LineDetail_Id AND dbo.BundleParentLine.LineDetails_Id = dbo.OrderLine.LineDetails_Id AND dbo.BundleParentLine.InvoiceHeader_Id = dbo.OrderLine.InvoiceHeader_Id
WHERE (dbo.BundleParentLine.OrderLineKey = N'76873264832') AND (dbo.BundleParentLine.InvoiceHeader_Id = 13) AND (dbo.BundleParentLine.LineDetails_Id = 6) AND (dbo.OrderLine.isprocessed = 0)
ORDER BY BPL_PrimeLineNo, BPL_SubLineNo
After Processed, set isprocessed = 1
Use the above InvoiceHeader_Id to loop throught all of the order information
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo,
LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty,
OriginalOrderedQty,
OtherCharges, OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty,
ReqCancelDate, ReqDeliveryDate, ReqShipDate, SCAC,
ScacAndService,
ScacAndServiceKey, ShipNode, ShipToID, ShipToKey, StatusQuantity,
SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent <> 'Y')
ORDER BY PrimeLineNo, SubLineNo
After Processed, set isprocessed = 1
This is not an answer but I want to pose this to prompt you for some clarification and it's not practical in the comments.
First, close visual studio and open SQL Server Management Studio and connect to the database
We'll build some select statements to get the data back and then we can think about an UPDATE statement
This first query you posted gives you the unprocessed headers for store 195 (straight from your post). I don't know if you want to do this for all stores or not. Paste it into SSMS and run it
SELECT H.*
FROM dbo.InvoiceHeader H
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
Now, this second query gives you the unprocessed lines against that header that have IsBundleParent='Y'.
Look! No loops!
SELECT H.*, L.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have For each record in the above list query below and add as Zero (0) dollar amount which makes no sense to me so perhaps you could clarify
Now we'll add the third query in which adds BundleParentLine (which does have inner joins in it already, but it's a pretty strange list of join fields)
SELECT H.*, L.*, BPL.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
INNER JOIN
dbo.BundleParentLine BPL
ON BPL.OrderLine_Id = L.OrderLine_Id
AND BPL.LineDetail_Id = L.LineDetail_Id
AND BPL.LineDetails_Id = L.LineDetails_Id
AND BPL.InvoiceHeader_Id = L.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have After Processed, set isprocessed = 1, but what is 'processed' here? Is it just setting it to processed in the table?
To do that you just write an UPDATE statement. Here's one way to do that to update just the header (don't run it though until we understand what you're trying to do). Is the problem here that you want to set header and line to processed at the same time?
UPDATE InvoiceHeader
SET Processed = 1
WHERE isprocessed = 0 AND storetoprocess = N'195'
The fourth query you have seems to account for the IsBundleParent <> 'Y' case
Again, we need to know what add as Zero (0) dollar amount and processed means to go any further.
just wondering, if MySQL can be used in corona or MySQL db's can be opened using sqlite3
or is there any difference If I use a MySQL db or sqlite3 db?
Corona provides its own library for connection to SQLite drivers, whose document can be accessed here.
local sqlite3 = require "sqlite3"
--Open data.db. If the file doesn't exist it will be created
local path = system.pathForFile("data.db", system.DocumentsDirectory)
db = sqlite3.open( path )
--Handle the applicationExit event to close the db
local function onSystemEvent( event )
if( event.type == "applicationExit" ) then
db:close()
end
end
--Setup the table if it doesn't exist
local tablesetup = [[CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, content, content2);]]
print(tablesetup)
db:exec( tablesetup )
--Add rows with a auto index in 'id'. You don't need to specify a set of values because we're populating all of them
local testvalue = {}
testvalue[1] = 'Hello'
testvalue[2] = 'World'
testvalue[3] = 'Lua'
local tablefill =[[INSERT INTO test VALUES (NULL, ']]..testvalue[1]..[[',']]..testvalue[2]..[['); ]]
local tablefill2 =[[INSERT INTO test VALUES (NULL, ']]..testvalue[2]..[[',']]..testvalue[1]..[['); ]]
local tablefill3 =[[INSERT INTO test VALUES (NULL, ']]..testvalue[1]..[[',']]..testvalue[3]..[['); ]]
db:exec( tablefill )
db:exec( tablefill2 )
db:exec( tablefill3 )
--print the sqlite version to the terminal
print( "version " .. sqlite3.version() )
--print all the table contents
for row in db:nrows("SELECT * FROM test") do
local text = row.content.." "..row.content2
local t = display.newText(text, 20, 30 * row.id, null, 16)
t:setFillColor( 1, 0, 1 )
end
--setup the system listener to catch applicationExit
Runtime:addEventListener( "system", onSystemEvent )
You can also use third party libraries (or batteries) for creating connections. One such library common in use is LuaSQL.
A thread on the Corona community shows you how to establish connection using LuaSQL for MySQL servers.
local luasql = require "luasql.mysql";
env = assert((luasql.mysql()), "Uh oh, couldn't load driver")
conn = assert(env:connect("database","username","password","localhost"), "Oops!!")
cur = assert (conn:execute ("SELECT * from table_1" ))
row = cur:fetch ({}, "a")
while row do
print ("\n------ new row ---------\n")
table.foreach (row, print)
row = cur:fetch (row, "a")
end
cur:close()
conn:close()
env:close()