snowflake placeholder format to use with the golang driver - snowflake-cloud-data-platform

I haven't been able to locate the placeholder format to use with the golang driver for snowflake. The docs here https://godoc.org/github.com/snowflakedb/gosnowflake currently do not state anything about it and their examples https://github.com/snowflakedb/gosnowflake/tree/81a8e973392a6d20381ab3797de63ba584f8d0d6/cmd do not use it also. Should I be using "?" or "%s"?

The Go Driver for Snowflake implements golang database/sql interfaces.
Assuming that by placeholder format you are talking about SQL statement bind variables, you can use the standard variable syntax supported by Snowflake: Either ? (unnamed, positional), or :name (named):
An example of positional-style:
db.ExecContext(ctx, `
delete from Invoice
where
TimeCreated < ?
and TimeCreated >= ?;`,
endTime,
startTime,
)
An example of named-style:
db.ExecContext(ctx, `
delete from Invoice
where
TimeCreated < :end
and TimeCreated >= :start;`,
sql.Named("start", startTime),
sql.Named("end", endTime),
)
The gosnowflake module docs carry an indirect reference to this but since they implement the database/sql interfaces, they implicitly support sending regular and named args alongside a statement to the Snowflake service.

Related

R : problem with the dplyr::tbl() function due to restricted permission

I work with large databases that needs to be stored into a server.
So, to work with them on Rstudio I have to open a connection to my Microsoft SQL Server with the dbConnect function :
conn <- dbConnect(odbc(),"myconnection",uid="***",pwd="***",schema="dbo",access="readonly")
and in order to use dplyr, I have to create data references with the tbl function :
data <- tbl(conn, "data")
But one of the online dataframe contains a columns that I can't read because I dont have the access, but I can read everything else.
The SQL query behind the tbl() function is :
SELECT * FROM data
and this is my problem.
Even when I try to select a specific column it doesn't work (see below), so I can't create my references and I can't work.
select(tbl(conn, "data"), "columnX")
=
SELECT columnX FROM data
I think this is the tbl() function and the call of "SELECT *" that blocks me.
Do you know what can I do ? Is there smilar functions that could resolve my problem ?
If you know the columns that you have access to, then one option is to bypass the default access SELECT * FROM ... with your own SQL query.
A remote table is defined by two components:
The database conneciton
The query to the database
When you connect with the default approach tbl(conn, 'data') then it defaults to a query SELECT * FROM data.
But here is another approach:
custom_query = 'SELECT columnX FROM data'
remote_table = tbl(conn, dbplyr::sql(customer_query))

Query Snowflake Named Internal Stage by Column NAME and not POSITION

My company is attempting to use Snowflake Named Internal Stages as a data lake to store vendor extracts.
There is a vendor that provides an extract that is 1000+ columns in a pipe delimited .dat file. This is a canned report that they extract. The column names WILL always remain the same. However, the column locations can change over time without warning.
Based on my research, a user can only query a file in a named internal stage using the following syntax:
--problematic because the order of the columns can change.
select t.$1, t.$2 from #mystage1 (file_format => 'myformat', pattern=>'.data.[.]dat.gz') t;
Is there anyway to use the column names instead?
E.g.,
Select t.first_name from #mystage1 (file_format => 'myformat', pattern=>'.data.[.]csv.gz') t;
I appreciate everyone's help and I do realize that this is an unusual requirement.
You could read these files with a UDF. Parse the CSV inside the UDF with code aware of the headers. Then output either multiple columns or one variant.
For example, let's create a .CSV inside Snowflake we can play with later:
create or replace temporary stage my_int_stage
file_format = (type=csv compression=none);
copy into '#my_int_stage/fx3.csv'
from (
select *
from snowflake_sample_data.tpcds_sf100tcl.catalog_returns
limit 200000
)
header=true
single=true
overwrite=true
max_file_size=40772160
;
list #my_int_stage
-- 34MB uncompressed CSV, because why not
;
Then this is a Python UDF that can read that CSV and parse it into an Object, while being aware of the headers:
create or replace function uncsv_py()
returns table(x variant)
language python
imports=('#my_int_stage/fx3.csv')
handler = 'X'
runtime_version = 3.8
as $$
import csv
import sys
IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]
class X:
def process(self):
with open(import_dir + 'fx3.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield(row, )
$$;
And then you can read this UDF that outputs a table:
select *
from table(uncsv_py())
limit 10
A limitation of what I showed here is that the Python UDF needs an explicit name of a file (for now), as it doesn't take a whole folder. Java UDFs do - it will just take longer to write an equivalent UDF.
https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-tabular-functions.html
https://docs.snowflake.com/en/user-guide/unstructured-data-java.html

select geometry from geopackage in dbbrowser for sqlite

I'm using dbmanager for sqlite with spatialite module loaded and I'm trying to read a geopackage I exported before from qgis 3.12.
I perform this both selections bellow and get the 'geometry' with hex function, and them as supposed WKT text (I'm not sure) but when I export it as csv to read in qgis again, it didn't work. I read some examples from sqlite docs to select it as AsText or AsBinary or other readable type in qgis, but it only return empty field.
Functions here: http://www.gaia-gis.it/gaia-sins/spatialite-sql-4.3.0.html#p16gpkg
First case:
SELECT cd_geocodi, nm_bairro, renmeddom, SRID(geom), hex(geom) from
setorcengeom_rj20171016 where renmeddom > 5000 group by cd_geocodi
order by nm_bairro DESC limit 10;
Second case:
SELECT cd_geocodi, nm_bairro, renmeddom, SRID(geom) as epsg,
AsText(CastAutomagic(geom)) AS geometry from setorcengeom_rj20171016
where renmeddom > 5000 group by cd_geocodi order by nm_bairro DESC
limit 30;
I believe I got the answer! I sucefully export it as CSV and load it in QGIS with no problems.
> SELECT cd_geocodi, nm_bairro, renmeddom, SRID(geom) as epsg,
> AsWKT(CastAutomagic(geom)) AS geometry from setorcengeom_rj20171016
> group by cd_geocodi order by nm_bairro DESC

Get max value from multiple columns in sqlalchemy

I was trying to convert the below query to sqlalchemy:
SELECT
addr_idn,
(SELECT MAX(LastUpdateDate)
FROM (VALUES (crt_dt),(upd_dt)) AS UpdateDate(LastUpdateDate))
AS LastUpdateDate
FROM (
select a. addr_idn,a.crt_dt crt_dt , b.upd_dt upd_dt
from emp_addr
where emp_addr.addr_idn = 1
) a
but I am not able to convert this into sqlalchemy. Please help me out to convert this query.
Credit To : Mike Bayer
The hard part here is the "FROM VALUES" which is not built-in to
SQLAlchemy.
There is a recipe at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PGValues
that will show how to build a "values()" function that will give you
the VALUES() expression you're looking for.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.

How to pass a string literal parameter to a peewee fn call

I've got a issue passing a string literal parameter to a SQL function using peewee's fn construct. I've got an object defined as:
class User(BaseModel):
computingID = CharField()
firstName = CharField()
lastName = CharField()
role = ForeignKeyField(Role)
lastLogin = DateTimeField()
class Meta:
database = database
I'm attempting to use the mySQL timestampdiff function in a select to get the number of days since the last login. The query should look something like this:
SELECT t1.`id`, t1.`computingID`, t1.`firstName`, t1.`lastName`, t1.`role_id`, t1.`lastLogin`, timestampdiff(day, t1.`lastLogin`, now()) AS daysSinceLastLogin FROM `user` AS t1
Here's the python peewee code I'm trying to use:
bob = User.select(User, fn.timestampdiff('day', User.lastLogin, fn.now()).alias('daysSinceLastLogin'))
result = bob[0].daysSinceLastLogin
But when I execute this code, I get an error:
ProgrammingError: (1064, u"You have an error in your SQL syntax; check
the manual that corresponds to your MySQL server version for the right
syntax to use near ''day', t1.lastLogin, now()) AS
daysSinceLastLogin FROM user AS t1' at line 1")
Judging from this message, it looks like the quote marks around the 'day' parameter are being retained in the SQL that peewee is generating. And mySQL doesn't like quotes around the parameter. I obviously can't leave off the quotes in the python code, so can someone tell me what I'm doing wrong please?
Update: I have my query working as intended by using the SQL() peewee command to add the DAY parameter, sans quote marks:
User.select(User, fn.timestampdiff(SQL('day'), User.lastLogin, fn.now()).alias('daysSinceLastLogin'))
But I'm not sure why I had to use SQL() in this situation. Am I missing anything, or is this the right answer?
Is there a reason you need to use an SQL function to do this?
In part because I'm not very comfortable with SQL functions, I would probably do something like this:
import datetime as dt
bob = user.get(User = "Bob") #or however you want to get the User instance
daysSinceLastLogin = (dt.datetime.now() - bob.lastLogin).days

Resources