PostgreSQL Unique Index Error - database

I'm busy writing a script to restore a database backup and I've run into something strange.
I have a table.sql file which only contains create table structures like
create table ugroups
(
ug_code char(10) not null ,
ug_desc char(60) not null
);
I have a second data.csv file which only contains delimiter data such as
xyz | dummy data
abc | more nothing
fun | is what this is
Then I have a third index.sql file which only creates indexes as such
create unique index i_ugroups on ugroups
(ug_code);
I use the commands from the terminal like so
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/table.sql" # loads table.sql
I have a batch script that loads in the data which works perfectly. Then I use the command
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/index.sql" # loads index.sql
When I try to create the unique indexes it is giving me the error
ERROR: could not create unique index "i_ugroups"
DETAIL: Key (ug_code)=(transfers ) is duplicated.
What's strange is that when I execute the table.sql file and the index.sql file together and load the data last I get no errors and it all works.
Is there something I am missing? why would it not let me create the unique indexes after the data has been loaded?

There are two rows in your column ug_code with the data "transfers " and that's why it can't create the index.
Why it would succeed if you create the index first, I don't know. But I would suspect that the second time it tries to insert "transfers " into database, it just fails the insert that time and other data gets inserted succesfully.

Related

In the tutorial "Tutorial: Bulk Loading from a local file system using copy" what is the difference between my_stage and my_table permissions?

I started to go through the first tutorial for how to load data into Snowflake from a local file.
This is what I have set up so far:
CREATE WAREHOUSE mywh;
CREATE DATABASE Mydb;
Use Database mydb;
CREATE ROLE ANALYST;
grant usage on database mydb to role sysadmin;
grant usage on database mydb to role analyst;
grant usage, create file format, create stage, create table on schema mydb.public to role analyst;
grant operate, usage on warehouse mywh to role analyst;
//tutorial 1 loading data
CREATE FILE FORMAT mycsvformat
TYPE = "CSV"
FIELD_DELIMITER= ','
SKIP_HEADER = 1;
CREATE FILE FORMAT myjsonformat
TYPE="JSON"
STRIP_OUTER_ARRAY = true;
//create stage
CREATE OR REPLACE STAGE my_stage
FILE_FORMAT = mycsvformat;
//Use snowsql for this and make sure that the role, db, and warehouse are seelcted: put file:///data/data.csv #my_stage;
// put file on stage
PUT file://contacts.csv #my
List #~;
list #%mytable;
Then in my active Snowsql when I run:
Put file:///Users/<user>/Documents/data/data.csv #my_table;
I have confirmed I am in the correct role Accountadmin:
002003 (02000): SQL compilation error:
Stage 'MYDB.PUBLIC.MY_TABLE' does not exist or not authorized.
So then I try to create the table in Snowsql and am successful:
create or replace table my_table(id varchar, link varchar, stuff string);
I still run into this error after I run:
Put file:///Users/<>/Documents/data/data.csv #my_table;
002003 (02000): SQL compilation error:
Stage 'MYDB.PUBLIC.MY_TABLE' does not exist or not authorized.
What is the difference between putting a file to a my_table and a my_stage in this scenario? Thanks for your help!
EDIT:
CREATE OR REPLACE TABLE myjsontable(json variant);
COPY INTO myjsontable
FROM #my_stage/random.json.gz
FILE_FORMAT = (TYPE= 'JSON')
ON_ERROR = 'skip_file';
CREATE OR REPLACE TABLE save_copy_errors AS SELECT * FROM TABLE(VALIDATE(myjsontable, JOB_ID=>'enterid'));
SELECT * FROM SAVE_COPY_ERRORS;
//error for random: Error parsing JSON: invalid character outside of a string: '\\'
//no error for generated
SELECT * FROM Myjsontable;
REMOVE #My_stage pattern = '.*.csv.gz';
REMOVE #My_stage pattern = '.*.json.gz';
//yay your are done!
The put command copies the file from your local drive to the stage. You should do the put to the stage, not that table.
put file:///Users/<>/Documents/data/data.csv #my_stage;
The copy command loads it from the stage.
But in document its mention like it gets created by default for every stage
Each table has a Snowflake stage allocated to it by default for storing files. This stage is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.
Table stages have the following characteristics and limitations:
Table stages have the same name as the table; e.g. a table named mytable has a stage referenced as #%mytable
in this case without creating stage its should load into default Snowflake stage allocated

Not able to use bcp commanline to inport data from | delimited file into temp table

I am using bcp command line to load data from pipe delimited file into temp table like below:
Create table tempdb..##test
(
test1 int,
test2 int,
test3 int
)
Select #CMD='BCP "tempdb..##test" IN "'+#sourcefoler +#filename +'" -S '+#servername+' -t "|" -c -q -t -F 2'
EXECUTE master..XPCMDSHELL #cmd
In file first row is header that's why I put 2 their "-c -q -t -F 2".
In case of temp table i am not able to load file but in case of normal table it will load into it. The reason I want to load it in temp table is I want to add new column in temp the table and then I want to queryout it in new file and the load that file into load them in final table with updated column.
Thanks in advance !!
Temp tables live only while session is active and cmd drops session just after import is finished - temp table cannot survive. You need to extend your batch with other steps to work with temp table (add column, update) and then make export in the same batch.

Issues using "-f" flag in CQLSH to run a query.cql file

I'm using cqlsh to add data to Cassandra with the BATCH query and I can load the data with a query using the "-e" flag but not from a file using the "-f" flag. I think that's because the file is local and Cassandra is remote. Details below:
This is a sample of my query (there are more rows to insert, obviously):
BEGIN BATCH;
INSERT INTO keyspace.table (id, field1) VALUES ('1','value1');
INSERT INTO keyspace.table (id, field1) VALUES ('2','value2');
APPLY BATCH;
If I enter the query via the "-e" flag then it works no problem:
>cqlsh -e "BEGIN BATCH; INSERT INTO keyspace.table (id, field1) VALUES ('1','value1'); INSERT INTO keyspace.table (id, field1) VALUES ('2','value2'); APPLY BATCH;" -u username -p password -k keyspace 99.99.99.99
But if I save the query to a text file (query.cql) and call as below, I get the following output:
>cqlsh -f query.cql -u username -p password -k keyspace 99.99.99.99
Using 3 child processes
Starting copy of keyspace.table with columns ['id', 'field1'].
Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
0 rows imported from 0 files in 0.076 seconds (0 skipped).
Cassandra obviously accepts the command but doesn't read the file, I'm guessing that's because the Cassandra is located on a remote server and the file is located locally. The Cassandra instance I'm using is a managed service with other users, so I don't have access to it to copy files into folders.
How do I run this query on a remote instance of Cassandra where I only have CLI access?
I want to be able to use another tool to build the query.cql file and have a batch job run the command with the "-f" flag but I can't work out how I'm going wrong.
You're executing a local cqlsh client so it should be able to access your local query.cql file.
Try to remove the BEGIN BATCH and APPLY BATCH and just let the 2 INSERT statements in the query.cql and retry again.
One other solution to insert data quickly is to provide a csv file and use the COPY command inside cqlsh. Read this blog post: http://www.datastax.com/dev/blog/new-features-in-cqlsh-copy
Scripting insert by generating one cqlsh -e '...' per line is feasible but it will be horribly slow

How to count amount of columns in sqlite from command line tool

I am trying to count the columns from a sqlite db using the sqlite command line tool. To test it I created a sample db like this:
c:\>sqlite.exe mydb.sqlite "create table tbl1(one varchar(10), two smallint);"
Now lets say i don't know that the table tbl1 has 2 columns, how can I find that using a query from the command line tool?
Run:
pragma table_info(yourTableName)
See:
http://www.sqlite.org/pragma.html#pragma_table_info
for more details.
Here is a way I found useful under Linux. Create a bash script file columns.sh and ensure it has execute permissions and copy - paste the following code.
columns() { for table in $(echo ".tables" | sqlite3 $1); do echo "$table $(echo "PRAGMA table_info($table);" | sqlite3 $1 | wc -l)"; done ;}
Type the following command, in terminal, on the first line to return results
$ columns <database name>
<table1> <# of columns>
<table2> <# of columns>
Note: Ensure database is not corrupted or encrypted.
source: http://www.quora.com/SQLite/How-can-I-count-the-number-of-columns-in-a-table-from-the-shell-in-SQLite
UPDATE
Here is an interesting URL for Python Script Solution
http://pagehalffull.wordpress.com/2012/11/14/python-script-to-count-tables-columns-and-rows-in-sqlite-database/

Using COPY FROM stdin to load tables, reading input file only once

I've got a large (~60 million row) fixed width source file with ~1800 records per row.
I need to load this file into 5 different tables on an instance of Postgres 8.3.9.
My dilemma is that, because the file is so large, I'd like to have to read it only once.
This is straightforward enough using INSERT or COPY as normal, but I'm trying to get a load speed boost by including my COPY FROM statements in a transaction that includes a TRUNCATE--avoiding logging, which is supposed to give a considerable load speed boost (according to http://www.cirrusql.com/node/3). As I understand it, you can disable logging in Postgres 9.x--but I don't have that option on 8.3.9.
The script below has me reading the input file twice, which I want to avoid... any ideas on how I could accomplish this by reading the input file only once? Doesn't have to be bash--I also tried using psycopg2, but couldn't figure out how to stream file output into the COPY statement as I'm doing below. I can't COPY FROM file because I need to parse it on the fly.
#!/bin/bash
table1="copytest1"
table2="copytest2"
#note: $1 refers to the first argument used when invoking this script
#which should be the location of the file one wishes to have python
#parse and stream out into psql to be copied into the data tables
( echo 'BEGIN;'
echo 'TRUNCATE TABLE ' ${table1} ';'
echo 'COPY ' ${table1} ' FROM STDIN'
echo "WITH NULL AS '';"
cat $1 | python2.5 ~/parse_${table1}.py
echo '\.'
echo 'TRUNCATE TABLE ' ${table2} ';'
echo 'COPY ' ${table2} ' FROM STDIN'
echo "WITH NULL AS '';"
cat $1 | python2.5 ~/parse_${table2}.py
echo '\.'
echo 'COMMIT;'
) | psql -U postgres -h chewy.somehost.com -p 5473 -d db_name
exit 0
Thanks!
You could use named pipes instead your anonymous pipe.
With this concept your python script could fill the tables through different psql processes with the corresponding data.
Create pipes:
mkfifo fifo_table1
mkfifo fifo_table2
Run psql instances:
psql db_name < fifo_table1 &
psql db_name < fifo_table2 &
Your python script would look about so (Pseudocode):
SQL_BEGIN = """
BEGIN;
TRUNCATE TABLE %s;
COPY %s FROM STDIN WITH NULL AS '';
"""
fifo1 = open('fifo_table1', 'w')
fifo2 = open('fifo_table2', 'w')
bigfile = open('mybigfile', 'r')
print >> fifo1, SQL_BEGIN % ('table1', 'table1') #ugly, with python2.6 you could use .format()-Syntax
print >> fifo2, SQL_BEGIN % ('table2', 'table2')
for line in bigfile:
# your code, which decides where the data belongs to
# if data belongs to table1
print >> fifo1, data
# else
print >> fifo2, data
print >> fifo1, 'COMMIT;'
print >> fifo2, 'COMMIT;'
fifo1.close()
fifo2.close()
Maybe this is not the most elegant solution, but it should work.
Why use COPY for the second table? I would assume that doing a:
INSERT INTO table2 (...)
SELECT ...
FROM table1;
would be faster than using COPY.
Edit
If you need to import different rows into different tables but from the same source file, maybe inserting everything into a staging table and then inserting the rows from there into the target tables is faster:
Import the .whole* text file into one staging table:
COPY staging_table FROM STDIN ...;
After that step, the whole input file is in staging_table
Then copy the rows from the staging table to the individual target tables by selecting only those that qualify for the corresponding table:
INSERT INTO table_1 (...)
SELECT ...
FROM staging_table
WHERE (conditions for table_1);
INSERT INTO table_2 (...)
SELECT ...
FROM staging_table
WHERE (conditions for table_2);
This is of course only feasible if you have enough space in your database to keep the staging table around.

Resources