I'm not sure why I can't test my function. My desired output is ID, then Room, but if there are multiple rooms for the same ID, then put it in a new row, like
ID Room
1 SW128 SW 143
into
ID Room
1 SW128
1 SW143
This is some of the data in the file.
1,SW128,SW143
2,SW309
3,AA205
4,AA112,SY110
5,AC223
6,AA112,AA206
but I can't even test my function. Can anyone please help me fix this?
def create_location_table(db, loc_file):
'''Location table has format ID, Room'''
con = sqlite3.connect(db)
cur = con. cursor()
cur.execute('''DROP TABLE IF EXISTS Locations''')
# create the table
cur.execute('''CREATE TABLE Locations (id TEXT, Room TEXT)''')
# Add the rows
loc_file = open('locations.csv', 'r')
loc_file.readline()
for line in loc_file:
d = {}
data = line.split(',')
ID = data[0]
Room = data[1:]
for (ID, Room) in d.items():
if Room not in d:
d[ID] = [Room]
for i in Rooms:
cur.execute(''' INSERT INTO Locations VALUES(?, ?)''', (ID,
Room))
# commit and close cursor and connection
con.commit()
cur.close()
con.close()
The problem is, that d is always an empty dict, so the for (ID, Room) in d.items() won't do anything. What you need to do is looping over Room. And you don't need the d dict.
def create_location_table(db, loc_file):
'''Location table has format ID, Room'''
con = sqlite3.connect(db)
cur = con. cursor()
cur.execute('''DROP TABLE IF EXISTS Locations''')
# create the table
cur.execute('''CREATE TABLE Locations (id TEXT, Room TEXT)''')
# open the CSV
csv_content = open(loc_file, 'r')
for line in csv_content:
data = line.strip().split(',')
# made lowercase following PEP8, but 'id' is a built in name in python
idx = data[0]
rooms = data[1:]
# loop through the rooms of this line and insert one row per room
for room in rooms:
cur.execute(''' INSERT INTO Locations VALUES(?, ?)''', (idx, room))
# for debug purposes only
print('INSERT INTO Locations VALUES(%s, %s)' % (idx, room))
# commit and close cursor and connection
con.commit()
cur.close()
con.close()
# call the method
create_location_table('db.sqlite3', 'locations.csv')
Note: Following PEP 8 I made your variables lowercase.
EDIT: post full code example, use loc_file parameter
Related
I've been trying to learn how to use sqlite3 for python 3.10 and I can't find any explanation of how I'm supposed to grab saved data From a database and insert it into a variable.
I'm attempting to do that myself in this code but It just prints out
<sqlite3.Cursor object at 0x0000018E3C017AC0>
Anyone know the solution to this?
My code is below
import sqlite3
con = sqlite3.connect('main.db')
cur = con.cursor()
#Create a table called "Datatable" if it does not exist
cur.execute('''CREATE TABLE IF NOT EXISTS datatable
(Name PRIMARY KEY, age, pronouns) ''')
# The key "PRIMARY KEY" after Name disallow's information to be inserted
# Into the table twice in a row.
name = 'TestName'#input("What is your name? : ")
age = 'TestAge'#input("What is your age? : ")
def data_entry():
cur.execute("INSERT INTO datatable (name, age)")
con.commit
name = cur.execute('select name from datatable')
print(name)
Expected result from Print(name) : TestName
Actual result : <sqlite3.Cursor object at 0x00000256A58B7AC0>
The execute statement fills the cur object with data, but then you need to get the data out:
rows = cur.fetchall()
for row in rows:
print(row)
You can read more here: https://www.sqlitetutorial.net/sqlite-python/sqlite-python-select/
I have a lot of data which is in form of list of dictionaries. I want to insert all the data into the snowflake table.
The primary key on the table is ID, i can receive new data for which there is already an id present then I would need to update the data. What I have done till now is since the data is large I have inserted the batch data into temporary table and the from temporary table I have used merge query to update/insert in main table.
def batch_data(data, chunk_size):
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]
def upsert_user_data(self, user_data):
columns = ["\"" + x + "\"" for x in user_data[0].keys()]
values = ['?' for _ in user_data[0].keys()]
for chunk in batch_data(user_data, 1000):
sql = f"INSERT INTO TEMP ({','.join(columns)}) VALUES ({','.join(values)});"
print(sql)
data_to_load = [[x for x in i.values()] for i in chunk]
snowflake_client.run(sql, tuple(data_to_load))
sql = "MERGE INTO USER USING (SELECT ID AS TID, NAME AS TNAME, STATUS AS TSTATUS FROM TEMP) AS TEMPTABLE" \
"ON USER.ID = TEMPTABLE.TID WHEN MATCHED THEN UPDATE SET USER.NAME = TEMPTABLE.TNAME, USER.STATUS = TEMPTABLE.TSTATUS " \
"WHEN NOT MATCHED THEN INSERT (ID, NAME, STATUS) VALUES (TEMPTABLE.TID, TEMPTABLE.TNAME, TEMPTABLE.TSTATUS);"
snowflake_client.run(sql)
Is there any way I can remove temporary table and use only merge query in batch way?
Currently my code have simple tables containing the data needed for each object like this:
infantry = {class = "army", type = "human", power = 2}
cavalry = {class = "panzer", type = "motorized", power = 12}
battleship = {class = "navy", type = "motorized", power = 256}
I use the tables names as identifiers in various functions to have their values processed one by one as a function that is simply called to have access to the values.
Now I want to have this data stored in a spreadsheet (csv file) instead that looks something like this:
Name class type power
Infantry army human 2
Cavalry panzer motorized 12
Battleship navy motorized 256
The spreadsheet will not have more than 50 lines and I want to be able to increase columns in the future.
Tried a couple approaches from similar situation I found here but due to lacking skills I failed to access any values from the nested table. I think this is because I don't fully understand how the tables structure are after reading each line from the csv file to the table and therefore fail to print any values at all.
If there is a way to get the name,class,type,power from the table and use that line just as my old simple tables, I would appreciate having a educational example presented. Another approach could be to declare new tables from the csv that behaves exactly like my old simple tables, line by line from the csv file. I don't know if this is doable.
Using Lua 5.1
You can read the csv file in as a string . i will use a multi line string here to represent the csv.
gmatch with pattern [^\n]+ will return each row of the csv.
gmatch with pattern [^,]+ will return the value of each column from our given row.
if more rows or columns are added or if the columns are moved around we will still reliably convert then information as long as the first row has the header information.
The only column that can not move is the first one the Name column if that is moved it will change the key used to store the row in to the table.
Using gmatch and 2 patterns, [^,]+ and [^\n]+, you can separate the string into each row and column of the csv. Comments in the following code:
local csv = [[
Name,class,type,power
Infantry,army,human,2
Cavalry,panzer,motorized,12
Battleship,navy,motorized,256
]]
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
Here is how you can load a csv using the I/O library:
-- Example of how to load the csv.
path = "some\\path\\to\\file.csv"
local f = assert(io.open(path))
local csv = f:read("*all")
f:close()
Alternative you can use io.lines(path) which would take the place of csv:gmatch("[^\n]+") in the for loop sections as well.
Here is an example of using the resulting table:
-- print table out
print("items = {")
for name, item in pairs(items) do
print(" " .. name .. " = { ")
for field, value in pairs(item) do
print(" " .. field .. " = ".. value .. ",")
end
print(" },")
end
print("}")
The output:
items = {
Infantry = {
type = human,
class = army,
power = 2,
},
Battleship = {
type = motorized,
class = navy,
power = 256,
},
Cavalry = {
type = motorized,
class = panzer,
power = 12,
},
}
my values
user = [[34, 'Victoria', '17:34:50', None], [40, 'Meherin', '00:04:00', '23:56:10'], [30, 'Micahle', '18:58:43', None]]
I have a postgresql function the name of merge_db() and it takes 4 argument. Now i want to insert value from user with python.
postgresql function.
CREATE FUNCTION merge_db(id1 integer, name1 character varying, login1 time, logout1 time) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the id
UPDATE my_company SET (name, login, logout) = (name1, login1, logout1) WHERE id = id1;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO my_company(id, name, login, logout) VALUES (id1, name1, login1, logout1);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- Do nothing, and loop to try the UPDATE again.
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
my python code such like
insert_query = "SELECT merge_db(%s) values %s"
execute_values(cur, insert_query, user)
conn.commit()
In this case throwing ValueError "ValueError: the query contains more than one '%s' placeholder"
I don't understand clearly that how to send user values as a merger_db argument.
Any help would be appreciated.
Thanks.
for i in user:
print(i[0], i[1], i[2], i[3], )
insert_query = "SELECT merge_db({}, '{}', '{}', '{}')".format(i[0], i[1], i[2], i[3]
cur.execute(insert_query)
It'll work good but will raise error duplicate key error.
With RODBC, there were functions like sqlUpdate(channel, dat, ...) that allowed you pass dat = data.frame(...) instead of having to construct your own SQL string.
However, with R's DBI, all I see are functions like dbSendQuery(conn, statement, ...) which only take a string statement and gives no opportunity to specify a data.frame directly.
So how to UPDATE using a data.frame with DBI?
Really late, my answer, but maybe still helpful...
There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:
library(DBI)
library(odbc)
con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test") # assumes Microsoft SQL Server
dbWriteTable(con, "iris", iris, row.names = TRUE) # create and populate a table (adding the row names as a separate columns used as row ID)
update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')
# create a modified version of `iris`
iris2 <- iris
iris2$Sepal.Length <- 5
iris2$Petal.Width[2] <- 1
iris2$row_names <- rownames(iris) # use the row names as unique row ID
dbBind(update, iris2) # send the updated data
dbClearResult(update) # release the prepared statement
# now read the modified data - you will see the updates did work
data1 <- dbReadTable(con, "iris")
dbDisconnect(con)
This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...
For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc
Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.
While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...
dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {
if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")
if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")
# Build the update string --------------------------------------------------
df_key <- dplyr::select(x, one_of(key_cols))
df_upt <- dplyr::select(x, -one_of(key_cols))
set_str <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))
set_str <- paste(set_str, collapse = ", ")
where_str <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
where_str <- paste(where_str, collapse = " AND ")
update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')
# Execute ------------------------------------------------------------------
query_res <- DBI::dbSendQuery(con, update_str)
DBI::dbClearResult(query_res)
return (invisible(TRUE))
}
Where
x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)
Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.
row_to_list <- function(Y) suppressWarnings(split(Y, f = row(Y)))
sql_val <- function(y){
if(!is.numeric(y)){
return(paste0("'",y,"'"))
}else{
if(is.na(y)){
return("NULL")
}else{
return(as.character(y))
}
}
}
to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")
bracket <- function(x) paste0("`",x,"`")
to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")
replace_into_table <- function(con, table_name, new_data){
#new_data <- data.table(new_data)
cols <- to_sql_string(names(new_data))
vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")
query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)
rs <- dbExecute(con, query)
return(rs)
}
tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample data
dbWriteTable(con, "test_table", tb) # create table
dbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary key
new_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new data
new_data[4,2] <- NA # add some NA values
new_data[5,3] <- NA
table_name <- "test_table"
replace_into_table(con, "test_table", new_data)
result <- dbReadTable(con, "test_table")