I'm dynamically generating queries from 11 different tables in SQL Server and storing that into S3 CSV file.
However, when I store null integer fields in CSV it converts them to float so when I'm doing a copy command it returns an error.
I really need to avoid that. is there and option for that?
for object in table_list:
if args.load_type == "full":
query_load = object["query_full"]
else:
query_load = object["query_active"]
df = pd.read_sql_query(query_load, sql_server_conn)
df = df.replace(",", " ", regex=True)
df = df.replace("\n", " ", regex=True)
#print(df)
#df = df * 1
#print(df.dtypes)
#print(df.info())
df = df.assign(extraction_dttm=currentdate)
csv_buffer = StringIO()
df.to_csv(csv_buffer, index=False)
folder_name = "{}".format(object["lake_table_name"])
file_name = "{}_{}.csv".format(object["lake_table_name"], currentdate.strftime("%Y%m%d"))
full_path_to_file = DATALAKE_PATH + "/" + folder_name + "/" + file_name
# print("{} - Storing files in {} ... ".format(dt.utcnow(), datalake_bucket))
s3_resource.Object(datalake_bucket, full_path_to_file).put(Body=csv_buffer.getvalue())
Related
I understood that you could not do a full snowflake data dump and need to use the COPY command to unload data from a table into an internal (i.e. Snowflake) stage.
To automate the process, I thought to do it with Python. Do you think that is the best method?
import traceback
import snowflake.connector
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
url = URL(
user='??????',
password='????????',
account='??????-??????',
database='SNOWFLAKE',
role = 'ACCOUNTADMIN'
)
out_put_string = ""
try:
engine = create_engine(url)
connection = engine.connect()
# Get all the views from the SNOWFLAKE database
query = '''
show views in database SNOWFLAKE
'''
df = pd.read_sql(query, connection)
# Loop over all the views
df = df.reset_index() # make sure indexes pair with number of rows
for index, row in df.iterrows():
out_put_string += "VIEW:----------" + row['schema_name'] + "." + row['name'] + "----------\n"
df_view = pd.read_sql('select * from ' + row['schema_name'] + "." + row['name'], connection)
df_view.to_csv("/Temp/Output_CVS/" + row['schema_name'] + "-" + row['name'] + ".csv")
out_put_string += df_view.to_string() + "\n"
except:
print("ERROR:")
traceback.print_exc()
connection.close()
#Export all the Views in one file
text_file = open("/Temp/Output_CVS/AllViewsData.txt", "w")
text_file.write(out_put_string)
text_file.close()
Hello everyone I am writing my bot for discord, I get the values from google sheets, but they are not displayed beautifully. How can I align them so that the name is under the name and the number is under the numbers
Here's how it turns out https://i.stack.imgur.com/Gdquh.png
And it should be like this https://i.stack.imgur.com/UAPg2.png
spreadsheet_id = 'id'
result = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range='A1:C15', majorDimension='ROWS').execute()
values = result.get('values', [])
embed = discord.Embed(description="\n".join([x[0] + " " + x[2] for x in values]))
result2 = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range='A16:C28', majorDimension='ROWS').execute()
values2 = result2.get('values', [])
embed2 = discord.Embed(description="\n".join([x[0] + " " + x[2] for x in values2]))
I've been putting my data into pandas DataFrames, so I've been using to_markdown https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_markdown.html and printing it within code blocks.
import pandas as pd
array = pd.DataFrame(values)
embed = discord.Embed(description='```' + array.to_markdown(index=False) + '```')
I'm currently writing some code and am using pandas to export all of the data into csv files. My program runs multiple iterations until it has gone through all of the necessary files. Pandas is re-writing one file each iteration but when it moves onto the next file I need it to reset all of the data (I think).
Structure is roughly:
While loop>a few variables are named>program runs>dataframe=(pandas.DataFrame(averagepercentagelist,index=namelist,columns=header))
This part works with no problem for one file. When moving onto the next file, all of the arrays I use are reset and this I think is why pandas gives the error Shape of passed values is (1,1), indices imply (3,1).
Please let me know if I need to explain it better.
EDIT:
While True:
try:
averagepercentagelist=[]
namelist=[]
columns=[]
for row in database:
averagepercentagelist=["12","23"]
namelist=["Name0","Name1"]
columns=["Average percentage"]
dataframe=(pandas.DataFrame(averagepercentagelist,index=namelist,columns=header))
except Exception as e:
print e
break
SNIPPET:
dataframe= (pandas.DataFrame(averagepercentagelist,index=namelist,columns=header))
currentcalculatedatafrane = 'averages' + currentcalculate
dataframeexportpath = os.path.join(ROOT_PATH,'Averages',currentcalculatedatafrane)
dataframe.to_csv(dataframeexportpath)
FULL PROGRAM SO FAR:
import csv
import os
import re
import pandas
import tkinter as tk
from tkinter import messagebox
from os.path import isfile, join
from os import listdir
import time
ROOT_PATH = os.path.dirname(os.path.abspath(__file__))
indexforcalcu=0
line_count=0
testlist=[]
namelist=[]
header=['Average Percentage']
def clearvariables():
indexforcalcu=0
testlist=[]
def findaverageofstudent(findaveragenumber,numoftests):
total=0
findaveragenumber = findaveragenumber/numoftests
findaveragenumber = round(findaveragenumber, 1)
return findaveragenumber
def removecharacters(nameforfunc):
nameforfunc=str(nameforfunc)
elem=re.sub("[{'}]", "",nameforfunc)
return elem
def getallclasses():
onlyfiles = [f for f in listdir(ROOT_PATH) if isfile(join(ROOT_PATH, f))]
onlyfiles.remove("averagecalculatorv2.py")
return onlyfiles
def findaveragefunc():
indexforcalcu=-1
while True:
try:
totaltests=0
line_count=0
averagepercentagelist=[]
indexforcalcu=indexforcalcu+1
allclasses=getallclasses()
currentcalculate=allclasses[indexforcalcu]
classpath = os.path.join(ROOT_PATH, currentcalculate)
with open(classpath) as csv_file:
classscoredb = csv.reader(csv_file, delimiter=',')
for i, row in enumerate(classscoredb):
if line_count == 0:
while True:
try:
totaltests=totaltests+1
rowreader= {row[totaltests]}
except:
totaltests=totaltests-1
line_count = line_count + 1
break
else:
calculating_column_location=1
total=0
while True:
try:
total = total + int(row[calculating_column_location])
calculating_column_location = calculating_column_location + 1
except:
break
i=str(i)
name=row[0]
cleanname=removecharacters(nameforfunc=name)
namelist.append(cleanname)
findaveragenumbercal=findaverageofstudent(findaveragenumber=total,numoftests=totaltests)
averagepercentagelist.append(findaveragenumbercal)
line_count = line_count + 1
dataframe= (pandas.DataFrame(averagepercentagelist,index=namelist,columns=header))
currentcalculatedatafrane = 'averages' + i + currentcalculate
dataframeexportpath = os.path.join(ROOT_PATH,'Averages',currentcalculatedatafrane)
dataframe.to_csv(dataframeexportpath)
i=int(i)
except Exception as e:
print("ERROR!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n\n",e)
break
def makenewclass():
global newclassname
getclassname=str(newclassname.get())
if getclassname == "":
messagebox.showerror("Error","The class name you have entered is invalid.")
else:
classname = getclassname + ".csv"
with open(classname, mode='w') as employee_file:
classwriter = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
classwriter.writerow(["Name","Test 1"])
root=tk.Tk()
root.title("Test result average finder")
findaveragebutton=tk.Button(root,text="Find Averages",command=findaveragefunc())
findaveragebutton.grid(row=2,column=2,padx=(10, 10),pady=(0,10))
classnamelabel=tk.Label(root, text="Class name:")
classnamelabel.grid(row=1, column=0,padx=(10,0),pady=(10,10))
newclassname = tk.Entry(root)
newclassname.grid(row=1,column=1,padx=(10, 10))
newclassbutton=tk.Button(root,text="Create new class",command=makenewclass)
newclassbutton.grid(row=1,column=2,padx=(0, 10),pady=(10,10))
root.mainloop()
Thanks in advance,
Sean
Use:
import glob, os
import pandas as pd
ROOT_PATH = os.path.dirname(os.path.abspath(__file__))
#extract all csv files to list
files = glob.glob(f'{ROOT_PATH}/*.csv')
print (files)
#create new folder if necessary
new = os.path.join(ROOT_PATH,'Averages')
if not os.path.exists(new):
os.makedirs(new)
#loop each file
for f in files:
#create DataFrame and convert first column to index
df = pd.read_csv(f, index_col=[0])
#count average in each row, rond and create one colum DataFrame
avg = df.mean(axis=1).round(1).to_frame('Average Percentage')
#remove index name if nncessary
avg.index.name = None
print (avg)
#create new path
head, tail = os.path.split(f)
path = os.path.join(head, 'Averages', tail)
print (path)
#write DataFrame to csv
avg.to_csv(path)
How to save string to file to specify place ? I use path << 'string' to save, but it give it on end of file. In destination to xml(path) file have </databaseChangeLog>. I want to save to file before that word occurs.
There is java solution click, but it is static line. My file will be dynamic, I don't know with line it will be.
def add_to_version() {
def path = new File('C:/groovy/version-1.xml')
def branchId = "Promt"
def lineCount = 0
def count = path.eachLine { line ->
if(line.contains('<include file="' + branchId + '/' + branchId + '.xml" ')){
wordCount++
}else if(lineCount == 1 ){
println "package is there"
}
}
if(lineCount == 0){
path << '<include file="' + branchId + '/' + branchId + '.xml" ' + 'relativeToChangelogFile="true"/>'
}
}
code above do that :
and I want to get xml like that :
you can use xml parser like this:
def add_to_version(String branchId) {
def path = new File('C:/groovy/version-1.xml')
def xml = new XmlParser().parse(path)
xml.appendNode("include", [
file:"${branchId}/${branchId}.xml",
relativeToChangelogFile:"true"
])
groovy.xml.XmlUtil.serialize(xml, path.newOutputStream())
}
this variant will not keep the xml formatting and comments
however xml will be valid
I'm trying to run the following command in R in order to read a local tab-delimited file as a SQLite database:
library(RSQLite)
banco <- dbConnect(drv = "SQLite",
dbname = "data.sqlite")
dbWriteTable(conn = banco,
name = "Tarefas",
value = "data.tsv",
sep = "\t",
dec = ",",
na.strings = c("", NA),
row.names = FALSE,
header = TRUE)
However, the statements above yield the following error:
Error in read.table(fn, sep = sep, header = header, skip = skip, nrows
= nrows, : formal argument "na.strings" matched by multiple actual arguments
Which makes me think I'm not being able to pass na.strings explicitly as a read.delim argument. Running dbWriteTable without this argument gives me "RS-DBI driver: (RS_sqlite_import: ./data.tsv line 17696 expected 20 columns of data but found 18)". This is understandable, since I've checked line 17696 and it is almost completely blank.
Another test run using sqldf also gives me an error:
> read.csv2.sql(file = "data.tsv",
+ sql = "CREATE TABLE Tarefas AS SELECT * FROM FILE LIMIT 5",
+ dbname = "data.sqlite",
+ header = TRUE,
+ row.names = FALSE)
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: no such table: FILE)
Which I believe is an unrelated error, but still very confusing for someone who's pretty much an absolute SQL noob such as myself. Runnin read.csv.sql instead gives me this error:
Error in read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows, :
more columns than column names
So is there a way to pass na.strings = c("", NA) at dbWriteTable? Is there a better way to read 10 GB tab-delimited files into R aside from sqldf and RSQLite? I've already tried data.table and ff.