Change datatype of column SQLite3(Python) - database

I have a database that I am reading with sqlite3 in Python 2.7, using the following command:
# Change to database directory
os.chdir(data)
# Find database file
cur_db = glob.glob('*.db')
# Connect to database
con = sqlite3.connect('database.db')
c = con.cursor()
# Query database
print(len(available_table))
for row in c.execute('SELECT * FROM col1 '):
print row
which gives me something like:
(1, u'2.3', u'brown', u'0', u'hairy', u'banana', u'2', u'monkey')
I would like to look at values in the column w/ the value u'2.3' greater than 2. But this is a unicode string instead of a number, making it difficult to compare to a number (eg 2).
Ideally, I would like something like:
# Connect to database
con = sqlite3.connect('database.db')
c = con.cursor()
# Query database
c.execute('SELECT * FROM critter WHERE weight > 2'.
QUESTION: How can I add a conditional statement to extract only data rows where this element is greater than 2? I would like to leave the database unaltered.

Edit: oh you want a query to do that... You can do:
for row in c.execute('select * from critter where cast(weight as numeric) > 2'):
# do something
Older answer:
If you want to do this on the Python side you can use a try-except construct like so:
for row in c.execute('SELECT * FROM col1'):
try:
val = float(row[1]) # number here is the number of the element in the tuple
if val > 2:
print(val)
except ValueError:
pass

Related

Convert string to variable name in Lua

In Lua, I have a set of tables:
Column01 = {}
Column02 = {}
Column03 = {}
ColumnN = {}
I am trying to access these tables dynamically depending on a value. So, later on in the programme, I am creating a variable like so:
local currentColumn = "Column" .. variable
Where variable is a number 01 to N.
I then try to do something to all elements in my array like so:
for i = 1, #currentColumn do
currentColumn[i] = *do something*
end
But this doesn't work as currentColumn is a string and not the name of the table. How can I convert the string into the name of the table?
If I understand correctly, you're saying that you'd like to access a variable based on its name as a string? I think what you're looking for is the global variable, _G.
Recall that in a table, you can make strings as keys. Think of _G as one giant table where each table or variable you make is just a key for a value.
Column1 = {"A", "B"}
string1 = "Column".."1" --concatenate column and 1. You might switch out the 1 for a variable. If you use a variable, make sure to use tostring, like so:
var = 1
string2 = "Column"..tostring(var) --becomes "Column1"
print(_G[string2]) --prints the location of the table. You can index it like any other table, like so:
print(_G[string2][1]) --prints the 1st item of the table. (A)
So if you wanted to loop through 5 tables called Column1,Column2 etc, you could use a for loop to create the string then access that string.
C1 = {"A"} --I shorted the names to just C for ease of typing this example.
C2 = {"B"}
C3 = {"C"}
C4 = {"D"}
C5 = {"E"}
for i=1, 5 do
local v = "C"..tostring(i)
print(_G[v][1])
end
Output
A
B
C
D
E
Edit: I'm a doofus and I overcomplicated everything. There's a much simpler solution. If you only want to access the columns within a loop instead of accessing individual columns at certain points, the easier solution here for you might just be to put all your columns into a bigger table then index over that.
columns = {{"A", "1"},{"B", "R"}} --each anonymous table is a column. If it has a key attached to it like "column1 = {"A"}" it can't be numerically iterated over.
--You could also insert on the fly.
column3 = {"C"}
table.insert(columns, column3)
for i,v in ipairs(columns) do
print(i, v[1]) --I is the index and v is the table. This will print which column you're on, and get the 1st item in the table.
end
Output:
1 A
2 B
3 C
To future readers: If you want a general solution to getting tables by their name as a string, the first solution with _G is what you want. If you have a situation like the asker, the second solution should be fine.

How to add extension at end

How can i add .JPG at end of all the serial number (001,002,003).
C:\Users\Abc\Desktop\id card 10-12-19\001.jpg
C:\Users\Abc\Desktop\id card 10-12-19\002.jpg
Is there any way to add the extension at the end (.jpg), I would suggest how it work in both i.e Microsoft Excel and Google spreed sheet. Any suggestion or help. Thanks
try like:
=ARRAYFORMULA(IF(A1:A="";;A1:A&".jpg"))
=query(arrayformula(B21:B &".jpg"), "Select * where Col1<>'.jpg'")
in this formula I use query and array formula, and I put the formula in C21, the data populated in range all in column B begin from B21.
= arrayformula(B21:B &".jpg")
mean all row in column B begin from B21 will be given by ".jpg"
=query(arrayformula(B21:B &".jpg"), "Select * where Col1<>'.jpg'")
Only return from resul of array formula, just if in column B have string (not blank)

How to check df rows that has a difference between 2 columns and then send it to another table to verify information

I’m very new to python and am trying really hard these last few days on how to go through a df row by row, and check each row that has a difference between columns dQ and dCQ. I just said != 0 since there could be a pos or neg value. Now if this is true, I would like to check in another table whether certain criteria are met. I'm used to working in R, where I could store the df into a variable and call upon the column name, I can't seem to find a way to do it in python. I posted all of the code I’ve been playing with. I know this is messy, but any help would be appreciated. Thank you!
I've tried installing different packages that wouldn't work, I tried making a for loop (I failed miserably), maybe a function? I’m not sure where to even look. I've never learned Python, I’m really doing my best watching videos online and reading on here.
import pyodbc
import PyMySQL
import pandas as pd
import numpy as np
conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};"
"Server=***-***-***.****.***.com;"
"Database=****;"
"Trusted_Connection=no;"
"UID=***;"
"PWD=***")
# cur = conn.cursor()
# cur.execute("SELECT TOP 1000 tr.dQ, po.dCQ,
tr.dQ - po.dCQ as diff FROM [IP].[dbo].
[vT] tr (nolock) JOIN [IP].[dbo].[vP] po
ON tr.vchAN = po.vchCustAN WHERE tr.dQ
!= po.dCQ")
# query = cur.fetchall()
query = "SELECT TOP 100 tr.dQ, po.dCQ/*, tr.dQ -
po.dCQ as diff */FROM [IP].[dbo].[vT]
tr (nolock) INNER JOIN [IP].[dbo].[vP] po ON
tr.vchAN = po.vchCustAN WHERE tr.dQ !=
po.dCQ"
df = pd.read_sql(query, conn)
#print(df[2,])
cursor = conn.cursor(PyMySQL.cursors.DictCursor)
cursor.execute("SELECT TOP 100 tr.dQ, po.dCQ/*,
tr.dQ - po.dCQ as diff */FROM [IP].[dbo].
[vT] tr (nolock) INNER JOIN [IP].[dbo].
[vP] po ON tr.vchAN = po.vchCustAN
WHERE tr.dQ != po.dCQ")
result_set = cursor.fetchall()
for row in result_set:
print("%s, %s" % (row["name"], row["category"]))
# if df[3] != 0:
# diff = df[1]-df[2]
# print(diff)
# else:
# exit
# cursor = conn.cursor()
# for row in cursor.fetchall():
# print(row)
#
# for record in df:
# if record[1] != record[2]:
# print(record[3])
# else:
# record[3] = record[1]
# print(record)
# df['diff'] = np.where(df['dQ'] != df["dCQ"])
I expect some sort of notification that there's a difference in row xx, and now it will check in table vP to verify we received this data's details. I believe i can get to this point, if i can get the first part working. Any help is appreciated. I'm sorry if this question is not clear, i will do my best to answer any questions someone may have. Thank you!
One solution could be to make a new column where you store the result of the diff between df[1] and df[2]. One note first. It might be more precise to either name your columns when you make the df, then reference them with df['name1'] and df['name2'], or use df.iloc[:,1] and df.iloc[:,2]. Also note that column numbers start with zero, so these would refer to the second and third columns in the df. The reason to use iloc is and the colons is to explicitly state that you want all rows and and column numbers 1 and 2. Otherwise, with df[1] or df[2] if your df was transposed that may actually refer to what you think of as the index. Now, on to a solution.
You could try
df['diff']=df.iloc[:,1]-df.iloc[:,2]
df['diff_bool']=np.where(df['diff']==0,False, True)
or you could combine this into one method
df['diff_bool']==np.where(df.iloc[:,1]-df.iloc[:,2]==0,False, True)
This will create a column in your df that says if there is a difference between columns one and two. You don't actually need to loop through row by row because pandas functions work like matrix math, so df.iloc[:,1]-df.iloc[:,2] will apply the subtraction row by row automatically.

how can I loop macros in pyspark like in SAS?

I want to iterate the same code for different macro sets like in SAS and then append all tables populated together. As I am coming from sas background, I am quite confused about how to do this in Pyspark environment. Any help is much appreciated!
Example code is below :
STEP1: define macro variables
lastyear_st=201615
lastyear_end=201622
thisyear_st=201715
thisyear_end=201722
STEP2: loop the code through various macro variables
customer_spend=sqlContext.sql("""
select a.customer_code,
sum(case when a.week_id between %d and %d then a.spend else 0 end) as spend
from tableA
group by a.card_code
"""
%(lastyear_st,lastyear_end)
(thisyear_st,thisyear_end))
STEP3: append each of the dataset populated above to the base table
# macroVars are your start and end values arranged as list of list.
# where each innner list contains start and end value
macroVars = [[201615,201622],[201715, 201722]]
# loop thru list of list ==>
for start,end in macroVars:
# prepare query using the values of start and end
query = "SELECT a.customer_code,Sum(CASE\
WHEN a.week_id BETWEEN {} AND {} \
THEN a.spend \
ELSE 0 END) \
AS spend FROM tablea GROUP BY a.card_code".format(start,end)
# execute query
customer_spend = sqlContext.sql(query)
# depending on your base table setup use appropriate write command for example
customer_spend\
.write.mode('append')\
.parquet(os.path.join(tempfile.mkdtemp(), 'data'))

Using R to save results of a lm model to a database

I'm trying to take the results of a linear regression performed in R and store those results in a database.
Specifically, what I'm after is the data in coef(summary(myModel). I can turn that into a dataframe and use sqlSave(), but the coefficient names are not a column in the dataframe. How to I get the coefficients and the variable names into a single dataframe that can be saved using sqlSave()?
For clarity, I'm trying to store the data in a database table that has the columns:
VariableName, Estimate, StdError, tValue, pValue
Is there an easier way to prepare this data to be stored in a database? As an example here's what the results of coef(summary(myModel)) gives:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.52729727 2.623035966 19.64414439 1.941150e-58
factor(person)507 -0.73663931 2.627215539 -0.28038785 7.793456e-01
factor(person)713 -5.18612049 3.317899029 -1.56307363 1.189390e-01
TransCnt 0.02658798 0.005682853 4.67863266 4.132888e-06
factor(Month)5 0.67908563 1.119655304 0.60651312 5.445673e-01
factor(Month)6 2.09595623 1.169658148 1.79193915 7.400639e-02
factor(Month)7 2.91204838 1.333483558 2.18379024 2.964109e-02
datOut <- summary(myModel)$coef
datOut <- cbind(VariableName=rownames(datOut), datOut)
rownames(datOut) <- NULL
If you want to add your own column names:
colnames(datOut) <- c("VariableName", "Estimate", "StdError", "tValue", "pValue")
datOut
The table produced by summary.lm is a matrix. You can coerce toa dataframe with as.data.frame
df.coef <- as.data.frame( coef(summary(myModel)) )
The column names should be coerced to column names that have no spaces or quotes.

Resources