I'm having issues testing a data unload flow from Snowflake using the GET command to store the files on my local machine.
Following the documentation here, it should be as simple as creating a stage, copying the data I want to that stage, and then running a snowsql command locally to retrieve the files.
I'm on Windows 10, running the following snowsql command to try and unload the data, against a database populated with the test TCP-H data that snowflake provides:
snowsql -a <account id> -u <username> -q "
USE DATABASE TESTDB;
CREATE OR REPLACE STAGE TESTSNOWFLAKESTAGE;
copy into #TESTSNOWFLAKESTAGE/supplier from SUPPLIER;
GET #TESTSNOWFLAKESTAGE file://C:/Users/<local user>/Downloads/unload;"
All commands run successfully, except for the final GET:
SnowSQL * v1.2.14
Type SQL statements or !help
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.121s
+-------------------------------------------------+
| status |
|-------------------------------------------------|
| Stage area TESTSNOWFLAKESTAGE successfully created. |
+-------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.293s
+---------------+-------------+--------------+
| rows_unloaded | input_bytes | output_bytes |
|---------------+-------------+--------------|
| 100000 | 14137839 | 5636225 |
+---------------+-------------+--------------+
1 Row(s) produced. Time Elapsed: 7.548s
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
| file | size | status | message |
|-----------------------+------+--------+------------------------------------------------------------------------------------------------------|
| supplier_0_0_0.csv.gz | -1 | ERROR | An error occurred (403) when calling the HeadObject operation: Forbidden, file=supplier_0_0_0.csv.gz |
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
1 Row(s) produced. Time Elapsed: 1.434s
This 403 looks like it's coming from the S3 instance backing my Snowflake account, but that's part of the abstracted service layer provided by Snowflake, so I'm not sure where I would have to go to flip auth switches.
Any guidance is much appreciated.
You need to use Windows-based slashes in your local file path. So, assuming that to #NickW's point, you are filling your local user correctly, the format should be like the following:
file://C:\Users\<local user>\Downloads
There are some examples in the documentation for this here:
https://docs.snowflake.com/en/sql-reference/sql/get.html#required-parameters
Related
I am trying to connect to sql server with spark-jdbc, using JDBC_SESSION_INIT_STATEMENT to create a temporary table and then download data from the temporary table in the main query.
I have the following code:
//df is org.apache.spark.sql.DataFrameReader
val s = """select * into #tmp_table from ( SELECT op.ID,
| op.Date,
| op.DocumentID,
| op.Amount,
| op.AmountCurr,
| op.CurrencyID,
| operson.ObjectTypeId AS PersonOT,
| op.PersonID,
| ocontract.ObjectTypeId AS ContractOT,
| op.ContractID,
| op.DocNum,
| op.MomentCreate,
| op.ObjectTypeID,
| op.OwnerObjectID
|FROM dbo.Operation op With (Index = IX_Operation_Date) --Без хинта временами уходит в скан всей таблицы
|LEFT JOIN dbo.Object ocontract ON op.ContractID = ocontract.ID
|LEFT JOIN dbo.Object operson ON op.PersonID = operson.ID
|WHERE op.Date>='2019-01-01' and op.Date<'2020-01-01' AND 1=1
|) wrap_for_single_connect
|OPTION (LOOP JOIN, FORCE ORDER, MAX_GRANT_PERCENT=25)""".stripMargin
df
.option(JDBCOptions.JDBC_SESSION_INIT_STATEMENT, s)
.jdbc(
jdbcUrl,
"(select * from tempdb.#tmp_table) sub",
connectionProps)
i get com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '#tmp_table'.
And I have a feeling that JDBC_SESSION_INIT_STATEMENT is not working, because I deliberately tried to mess up the request and still got the Invalid object error.
How can I check if the request is working in JDBC_SESSION_INIT_STATEMENT?
One way to know whether your JDBCOptions.JDBC_SESSION_INIT_STATEMENT is executed is to enable INFO logging level for org.apache.spark.sql.execution.datasources.jdbc logger.
That should trigger this line and print out the following message to the logs:
Executing sessionInitStatement: [sql]
Given the comment I don't think you should use it to create a source table to load records from:
// This executes a generic SQL statement (or PL/SQL block) before reading
// the table/query via JDBC. Use this feature to initialize the database
// session environment, e.g. for optimizations and/or troubleshooting.
You should use dbtable or query parameter instead.
I am completing a snowflake university workshop but I have run into a problem. The course has provided an AVRO file and asked us to insert the data into a Variant column table. However when I run the COPY INTO commamd I get this error:
Number of columns in file (11) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File 'iot_files/iot_files_sample_output.avro', line 1, character 827 Row 1, column "IOT_AVRO_DATA"[11] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
These are the instructions given by the course:
CREATE OR REPLACE TABLE IOT_AVRO_DATA
(mycolumn VARIANT);
copy INTO IOT_AVRO_DATA
FROM #GOOGLE_BUCKET_SFHOL/iot_files/iot_files_sample_output.avro;
FILE_FORMAT = (type = AVRO);
It looks like there is a mismatch between the number of columns in the file and in the table.
Any help advice would be appreciated, tried reaching out to snowflake via the workshop but they have not responded.
Are you sure your AVRO file is not corrupted?
The following works fine for me:
Upload to my stage a sample avro file (userdata1.avro taken from here)
spanaite#(no warehouse)#SERGIU_DB.(no schema)>put file:///Users/spanaite/Downloads/userdata1.avro #~;
+----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| userdata1.avro | userdata1.avro.gz | 93561 | 79248 | NONE | GZIP | UPLOADED | |
+----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 3.026s
spanaite#(no warehouse)#SERGIU_DB.(no schema)>
Create a table and load the avro file:
create or replace table test_avro(mycolumn VARIANT);
copy into test_avro from #~/userdata1.avro.gz file_format = (type = AVRO);
select * from test_avro;
Try with one of the sample files from the link I posted above.
After searching TDengine online documentation: https://www.taosdata.com/en/documentation/, I found the the command to change the default database parameter "keep", which indicates how long data will be preserved in databases. However after I have typed in that command from shell, "show variables" command still shows the old value. How would I know if changing this parameter is taking effect?
taos> alter database test keep 50;
Query OK, 0 of 0 row(s) in database (0.019087s)
taos> show variables;
name | value |
============================================================
version | 2.1.5.0 |
buildinfo | Built at 2021-08-05 23:49:17 |
walLevel | 1 |
comp | 2 |
precision | 0 |
maxRows | 4096 |
minRows | 100 |
keep | 3650 |
the alter command is effective at the DataBase level, and show varibles is show the global parameters.
you can use show databases; to check the database's parameter.
if you want change the show variables;'s show, you should modify the config file /etc/taos.cfg
and there are only serval parameters can modify by alter command.
I have a CentOS server running a local memsql cluster (aggregator and leaf on the same machine). I have a databse named offers. For some reason, I cannot execute any queries against tables in my database.
Everything was working fine until I tried to add another machine to the cluster. I had the IT team at my place replicate the server I was working on (completely). I went over to the replicated server, deleted the database in question and then registered the server using the memsql-toolbox-config register-node command. Then the database showed it was under the transition state. I restarted memsql using memsql-ops and got to this situation.
Running a simple query yields:
memsql> select * from table;
ERROR 2261 (HY000): Query `select * from table` couldn't be executed because of an in progress failover operation. Check the status of the leaf nodes in the cluster (error 1049:'Leaf Error (172.26.32.20:3307): Unknown database 'offers_5'')
The output for the the cluster status command is:
memsql> show cluster status;
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
| Node ID | Host | Port | Database | Role | State | Position | Master Host | Master Port | Metadata Master Node ID | Metadata Master Host | Metadata Master Port | Metadata Role | Details |
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
| 1 | 172.26.32.20 | 3306 | cluster | master | online | 0:181 | NULL | NULL | NULL | NULL | NULL | Reference | |
| 1 | 172.26.32.20 | 3306 | offers | master | online | 0:156505 | NULL | NULL | NULL | NULL | NULL | Reference | |
| 2 | 172.26.32.20 | 3307 | cluster | async slave | replicating | 0:180 | 172.26.32.20 | 3306 | 1 | 172.26.32.20 | 3306 | Reference | stage: packet wait, state: x_streaming, err: no |
| 2 | 172.26.32.20 | 3307 | offers | sync slave | replicating | 0:156505 | 172.26.32.20 | 3306 | 1 | 172.26.32.20 | 3306 | Reference | |
+---------+--------------+------+----------+-------------+-------------+----------+--------------+-------------+-------------------------+----------------------+----------------------+---------------+-------------------------------------------------+
4 rows in set (0.00 sec)
So it seems that the the second node is replicating. Also note the details column saying:
stage: packet wait, state: x_streaming, err: no
Running the replication status command gives:
memsql> show replication status;
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| Role | Database | Master_URI | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Slave_URI | Slave_State | Slave_CommitLSN | Slave_HardenedLSN | Slave_ReplayLSN | Slave_TailLSN | Slave_Commits |
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
| master | cluster | NULL | online | 0:181 | 0:181 | 0:177 | 0:181 | 86 | yes | 172.26.32.20:3307/cluster | replicating | 0:180 | 0:181 | 0:180 | 0:181 | 84 |
| master | offers | NULL | online | 0:156505 | 0:156505 | 0:156505 | 0:156505 | 183 | yes | 172.26.32.20:3307/offers | replicating | 0:156505 | 0:156505 | 0:156505 | 0:156505 | 183 |
+--------+----------+------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+---------------------------+-------------+-----------------+-------------------+-----------------+---------------+---------------+
2 rows in set (0.00 sec)
I never initiated any fail over or replication. Anyone knows why this is happening? How could I solve this?
EDIT:
Using memsql-ops I get:
[me#memsql ~]$ memsql-ops memsql-list
ID Agent Id Process State Cluster State Role Host Port Version
33829AF Af13af7 RUNNING CONNECTED MASTER 172.26.32.20 3306 6.5.18
BBA1B61 Af13af7 RUNNING CONNECTED LEAF 172.26.32.20 3307 6.5.18
But with memsql-admin, with the new memsql tools:
[me#memsql ~]$ memsql-admin list-nodes
✘ Failed to list nodes on all hosts: failed to list nodes on 1 host:
172.26.32.20
No nodes found
Making my question a bit clearer - How can I get my server to respond to queries again? And after I do, How should I act to add another host? Should I clean the replicated server completely of any memsql data?
2nd EDIT:
I managed to solve this problem by delete my database and cluster data, and setting up a new one using the new MemSQL tools, throwing away MemsqlOps. Read my answer.
It looks like there are a couple things that might be causing problems. Generally speaking, cloning a memsql server is not something that is supported nor the best way to go about adding nodes. It also looks like you may be using both the older Ops management tool and the newer MemSQL tools. I would recommend not installing or using Ops and sticking to just the new MemSQL tools instead.
A good place to start would be to try recreating the nodes after cloning; a cloned memsql node won't correctly become part of the cluster. You should also verify that you don't have more than one master aggregator in the cluster. If you can start with that and see if it resolves your issues I'm happy to help with any other problems that you run into.
I managed to set up a working cluster.
As micahbhakti mentioned in his answer, I tried using only the newer MemSQL tools, instead of the deprecated MemSQL Ops. It required deleting the MemSQL agent existing on both servers and then following the tutorial in the MemSQL documentation. Here are the steps I took for anyone struggling with this issue which is better described as: My MemSQL-Ops-managed-MemSQL-cluster is not responding. How can I upgrade it to a working MemSQL-tools-managed-cluster?
1. Save what data you can
The following step is to delete all memsql data, so it would be best if you could save your data. The table data could be stored in CSV files easily with a simple
SELECT * FROM important_data_containing_table INTO OUTFILE '/home/yourfolder/yourcsvfile.csv';
Do this for all tables containing important data. You could also save the scheme itself. You can do that by viewing and copying to another file all the create queries you used to create the table originally, to re-execute them later. Use this
SHOW CREATE TABLE your_table_name
The documentation for mysql is described here. It might not be similar to the syntax used in mem, but the above base command works. For exact information, read about MySQL Features Unsupported in MemSQL.
2. Delete anything to do with Memsql-Ops
As it is said here about the uninstall command:
Stops the local MemSQL Ops agent and deletes all its data.
If MemSQL nodes are already installed in the local host, this command will prompt users to delete those nodes first before proceeding with the uninstall.
And indeed, if there is a node runnning (in my case there were), you will be prompted to run another command to delete those nodes: memsql-ops memsql-delete --all. This WILL delete all data in your database as said in it's documentation:
Deletes all data for a MemSQL node. This operation is not reversible and may lead to data loss. Users who want to perform this operation are prompted to explicitly type ‘DELETE’ to be sure of their decision.
That's why I asked you to save what ever you need :)
This should be done for each host you want to include in your new shiny cluster.
3. Follow the instructions to create the new cluster using MemSQL tools
After you cleaned your servers from the deprecated MemSQL ops agent and data, you can follow the instructions here. I chose to set up a multiple host comprehensive set up. The process will ask you to register your hosts, and then set up the nodes roles (master aggregator, aggreators and leafs), ip addresses, passwords, ports and etc.
After that, you can try to test the cluster, making changes in one machine and view them in another. Also the output for memsql-admin list-nodes on the deploying machine for my cluster was:
+------------+------------+--------------+------+---------------+--------------+---------+----------------+--------------------+
| MemSQL ID | Role | Host | Port | Process State | Connectable? | Version | Recovery State | Availability Group |
+------------+------------+--------------+------+---------------+--------------+---------+----------------+--------------------+
| AAAAAAAAAA | Master | 172.26.32.20 | 3306 | Running | True | 6.7.16 | Online | |
| BBBBBBBBBB | Aggregator | 172.26.32.22 | 3306 | Running | True | 6.7.16 | Online | |
| CCCCCCCCCC | Leaf | 172.26.32.20 | 3307 | Running | True | 6.7.16 | Online | 1 |
| DDDDDDDDDD | Leaf | 172.26.32.22 | 3307 | Running | True | 6.7.16 | Online | 1 |
+------------+------------+--------------+------+---------------+--------------+---------+----------------+--------------------+
4. Restore the data
Re-execute all the create table queries you saved in step 1, and import all data exported to a csv using this syntax:
LOAD DATA INFILE '/home/yourfolder/yourcsvfile.csv' INTO TABLE your_table;
And that's it! Now you can manage your cluster using the new MemSQL studio that run on the default http://your_deployment_machine:8080.
Enjoy :)
https://github.com/markfink/dbslim
I'd like to execute the stored procedures with DbSlim using Fitnesse (Selenium, Xebium)
now what I tried to do is:
!define dbQuerySelectCustomerbalance (
execute dbo.uspLogError
)
| script | Db Slim Select Query | !-${dbQuerySelectCustomerbalance}-! |
which gives a green indicator,
however Microsoft SQL Server profiler gives no actions/logging...
so what i'd like to know is: is it possible to use dbslim for executing stored procedures,
if yes
what is the correct way to do it?
By the way, the connection to the Database i've on 1 page, and on the query page i included the connection to the database. (is that ok?)
Take out the !- ... -!. It is used to escape wikified words. But in this case you want it to be translated to the actual query.
!define dbQuerySelectCustomerbalance ( execute dbo.uspLogError )
| script | Db Slim Select Query | ${dbQuerySelectCustomerbalance} |
| show | data by column index | 1 | and row index | 1 |
You can add in the last line which outputing the first column of the first row for testing purpose if your SP is returning some result (or you can create one simple SP just to test this out)
Specifying the connection anywhere before this block will be fine, be it on the same page or in an SetUp/SuiteSetUp/normal page included/executed before.