How to count amount of columns in sqlite from command line tool - database

I am trying to count the columns from a sqlite db using the sqlite command line tool. To test it I created a sample db like this:
c:\>sqlite.exe mydb.sqlite "create table tbl1(one varchar(10), two smallint);"
Now lets say i don't know that the table tbl1 has 2 columns, how can I find that using a query from the command line tool?

Run:
pragma table_info(yourTableName)
See:
http://www.sqlite.org/pragma.html#pragma_table_info
for more details.

Here is a way I found useful under Linux. Create a bash script file columns.sh and ensure it has execute permissions and copy - paste the following code.
columns() { for table in $(echo ".tables" | sqlite3 $1); do echo "$table $(echo "PRAGMA table_info($table);" | sqlite3 $1 | wc -l)"; done ;}
Type the following command, in terminal, on the first line to return results
$ columns <database name>
<table1> <# of columns>
<table2> <# of columns>
Note: Ensure database is not corrupted or encrypted.
source: http://www.quora.com/SQLite/How-can-I-count-the-number-of-columns-in-a-table-from-the-shell-in-SQLite
UPDATE
Here is an interesting URL for Python Script Solution
http://pagehalffull.wordpress.com/2012/11/14/python-script-to-count-tables-columns-and-rows-in-sqlite-database/

Related

Issues using "-f" flag in CQLSH to run a query.cql file

I'm using cqlsh to add data to Cassandra with the BATCH query and I can load the data with a query using the "-e" flag but not from a file using the "-f" flag. I think that's because the file is local and Cassandra is remote. Details below:
This is a sample of my query (there are more rows to insert, obviously):
BEGIN BATCH;
INSERT INTO keyspace.table (id, field1) VALUES ('1','value1');
INSERT INTO keyspace.table (id, field1) VALUES ('2','value2');
APPLY BATCH;
If I enter the query via the "-e" flag then it works no problem:
>cqlsh -e "BEGIN BATCH; INSERT INTO keyspace.table (id, field1) VALUES ('1','value1'); INSERT INTO keyspace.table (id, field1) VALUES ('2','value2'); APPLY BATCH;" -u username -p password -k keyspace 99.99.99.99
But if I save the query to a text file (query.cql) and call as below, I get the following output:
>cqlsh -f query.cql -u username -p password -k keyspace 99.99.99.99
Using 3 child processes
Starting copy of keyspace.table with columns ['id', 'field1'].
Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
0 rows imported from 0 files in 0.076 seconds (0 skipped).
Cassandra obviously accepts the command but doesn't read the file, I'm guessing that's because the Cassandra is located on a remote server and the file is located locally. The Cassandra instance I'm using is a managed service with other users, so I don't have access to it to copy files into folders.
How do I run this query on a remote instance of Cassandra where I only have CLI access?
I want to be able to use another tool to build the query.cql file and have a batch job run the command with the "-f" flag but I can't work out how I'm going wrong.
You're executing a local cqlsh client so it should be able to access your local query.cql file.
Try to remove the BEGIN BATCH and APPLY BATCH and just let the 2 INSERT statements in the query.cql and retry again.
One other solution to insert data quickly is to provide a csv file and use the COPY command inside cqlsh. Read this blog post: http://www.datastax.com/dev/blog/new-features-in-cqlsh-copy
Scripting insert by generating one cqlsh -e '...' per line is feasible but it will be horribly slow

Extract one by one data from Database through Shell Script

I have to code in Korn Shell. I have to take data from one database; and create "insert into" statements in a .sql file. And run this .sql file in another database.
There are 24 columns in the table; I'm not able to extract data from that table one by one in order to create insert into statement.
Can anyone help me with the same?
I wrote the following code till now(just a sample, with two columns data)
$ cat analysis.sh
#!/bin/ksh
function sqlQuery {
ied sqlplus -s / << 'EOF'
DEFINE DELIMITER='${TAB_SPACE}'
set heading OFF termout ON trimout ON feedback OFF
set pagesize 0
SELECT ID, H00
FROM SW_ABC
WHERE ID=361140;
EOF
}
eval x=(`sqlQuery`)
ID=${x[0]}
HOUR=${x[1]}
echo ID is $ID
echo HOUR is $HOUR
But here eval is not working.

PostgreSQL Unique Index Error

I'm busy writing a script to restore a database backup and I've run into something strange.
I have a table.sql file which only contains create table structures like
create table ugroups
(
ug_code char(10) not null ,
ug_desc char(60) not null
);
I have a second data.csv file which only contains delimiter data such as
xyz | dummy data
abc | more nothing
fun | is what this is
Then I have a third index.sql file which only creates indexes as such
create unique index i_ugroups on ugroups
(ug_code);
I use the commands from the terminal like so
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/table.sql" # loads table.sql
I have a batch script that loads in the data which works perfectly. Then I use the command
/opt/postgresql/bin/psql -d dbname -c "\i /tmp/index.sql" # loads index.sql
When I try to create the unique indexes it is giving me the error
ERROR: could not create unique index "i_ugroups"
DETAIL: Key (ug_code)=(transfers ) is duplicated.
What's strange is that when I execute the table.sql file and the index.sql file together and load the data last I get no errors and it all works.
Is there something I am missing? why would it not let me create the unique indexes after the data has been loaded?
There are two rows in your column ug_code with the data "transfers " and that's why it can't create the index.
Why it would succeed if you create the index first, I don't know. But I would suspect that the second time it tries to insert "transfers " into database, it just fails the insert that time and other data gets inserted succesfully.

How do I output the results of a HiveQL query to CSV?

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:
insert overwrite directory '/home/output.csv' select books from table;
When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?
Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.
According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.
Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.
A slight modification (adding the LOCAL keyword) will store the data in a local directory.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;
When I run a similar query, here's what the output looks like.
[lvermeer#hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
[lvermeer#hadoop temp]$ head 000000_0
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE
Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select books from table' > /home/lvermeer/temp.tsv
That gives me a tab-separated file that I can use. Hope that is useful for you as well.
Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):
hive -e 'select books from table' | sed 's/[[:space:]]\+/,/g' > /home/lvermeer/temp.csv
This is most csv friendly way I found to output the results of HiveQL.
You don't need any grep or sed commands to format the data, instead hive supports it, just need to add extra tag of outputformat.
hive --outputformat=csv2 -e 'select * from <table_name> limit 20' > /path/toStore/data/results.csv
You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.
You also might have to do some trickery to convert the files from '\001' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.
You can use INSERT … DIRECTORY …, as in this example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees
WHERE se.state = 'CA';
OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.
If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.
I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.
To make the column data tsv safe, I replaced all \t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:
hive -e 'tab_replaced_hql_query' | python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)\nfor row in reader: writer.writerow(row)")'
This created a perfectly valid csv. Hope this helps those who come looking for this solution.
You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )
for ex:
hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv
I had a similar issue and this is how I was able to address it.
Step 1 - Loaded the data from Hive table into another table as follows
DROP TABLE IF EXISTS TestHiveTableCSV;
CREATE TABLE TestHiveTableCSV
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' AS
SELECT Column List FROM TestHiveTable;
Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension
Start-AzureStorageBlobCopy
-DestContext $destContext
-SrcContainer "Source Container"
-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
-DestContainer "Destination Container"
-DestBlob "CSV/TestHiveTable.csv"
hive --outputformat=csv2 -e "select * from yourtable" > my_file.csv
or
hive --outputformat=csv2 -e "select * from yourtable" > [your_path]/file_name.csv
For tsv, just change csv to tsv in the above queries and run your queries
The default separator is "^A". In python language, it is "\x01".
When I want to change the delimiter, I use SQL like:
SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table
Then, regard delimiter+"^A" as a new delimiter.
I tried various options, but this would be one of the simplest solution for Python Pandas:
hive -e 'select books from table' | grep "|" ' > temp.csv
df=pd.read_csv("temp.csv",sep='|')
You can also use tr "|" "," to convert "|" to ","
Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.
In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.
It will:
Login to bastion host.
pbrun.
kinit.
beeline (with your query).
Save echo from beeline to a file on Windows.
Execute it like this:
set PROXY_HOST=your_bastion_host
set SERVICE_USER=you_func_user
set LINUX_USER=your_SOID
set LINUX_PWD=your_pwd
python hh.py --query_file=query.sql
Just to cover more following steps after kicking off the query:
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
In my case, the generated data under temp folder is in deflate format,
and it looks like this:
$ ls
000000_0.deflate
000001_0.deflate
000002_0.deflate
000003_0.deflate
000004_0.deflate
000005_0.deflate
000006_0.deflate
000007_0.deflate
Here's the command to unzip the deflate files and put everything into one csv file:
hadoop fs -text "file:///home/lvermeer/temp/*" > /home/lvermeer/result.csv
I may be late to this one, but would help with the answer:
echo "COL_NAME1|COL_NAME2|COL_NAME3|COL_NAME4" > SAMPLE_Data.csv
hive -e '
select distinct concat(COL_1, "|",
COL_2, "|",
COL_3, "|",
COL_4)
from table_Name where clause if required;' >> SAMPLE_Data.csv
This shell command prints the output format in csv to output.txt without the column headers.
$ hive --outputformat=csv2 -f 'hivedatascript.hql' --hiveconf hive.cli.print.header=false > output.txt
Use the command:
hive -e "use [database_name]; select * from [table_name] LIMIT 10;" > /path/to/file/my_file_name.csv
I had a huge dataset whose details I was trying to organize and determine the types of attacks and the numbers of each type. An example that I used on my practice that worked (and had a little more details) goes something like this:
hive -e "use DataAnalysis;
select attack_cat,
case when attack_cat == 'Backdoor' then 'Backdoors'
when length(attack_cat) == 0 then 'Normal'
when attack_cat == 'Backdoors' then 'Backdoors'
when attack_cat == 'Fuzzers' then 'Fuzzers'
when attack_cat == 'Generic' then 'Generic'
when attack_cat == 'Reconnaissance' then 'Reconnaissance'
when attack_cat == 'Shellcode' then 'Shellcode'
when attack_cat == 'Worms' then 'Worms'
when attack_cat == 'Analysis' then 'Analysis'
when attack_cat == 'DoS' then 'DoS'
when attack_cat == 'Exploits' then 'Exploits'
when trim(attack_cat) == 'Fuzzers' then 'Fuzzers'
when trim(attack_cat) == 'Shellcode' then 'Shellcode'
when trim(attack_cat) == 'Reconnaissance' then 'Reconnaissance' end,
count(*) from actualattacks group by attack_cat;">/root/data/output/results2.csv

Using COPY FROM stdin to load tables, reading input file only once

I've got a large (~60 million row) fixed width source file with ~1800 records per row.
I need to load this file into 5 different tables on an instance of Postgres 8.3.9.
My dilemma is that, because the file is so large, I'd like to have to read it only once.
This is straightforward enough using INSERT or COPY as normal, but I'm trying to get a load speed boost by including my COPY FROM statements in a transaction that includes a TRUNCATE--avoiding logging, which is supposed to give a considerable load speed boost (according to http://www.cirrusql.com/node/3). As I understand it, you can disable logging in Postgres 9.x--but I don't have that option on 8.3.9.
The script below has me reading the input file twice, which I want to avoid... any ideas on how I could accomplish this by reading the input file only once? Doesn't have to be bash--I also tried using psycopg2, but couldn't figure out how to stream file output into the COPY statement as I'm doing below. I can't COPY FROM file because I need to parse it on the fly.
#!/bin/bash
table1="copytest1"
table2="copytest2"
#note: $1 refers to the first argument used when invoking this script
#which should be the location of the file one wishes to have python
#parse and stream out into psql to be copied into the data tables
( echo 'BEGIN;'
echo 'TRUNCATE TABLE ' ${table1} ';'
echo 'COPY ' ${table1} ' FROM STDIN'
echo "WITH NULL AS '';"
cat $1 | python2.5 ~/parse_${table1}.py
echo '\.'
echo 'TRUNCATE TABLE ' ${table2} ';'
echo 'COPY ' ${table2} ' FROM STDIN'
echo "WITH NULL AS '';"
cat $1 | python2.5 ~/parse_${table2}.py
echo '\.'
echo 'COMMIT;'
) | psql -U postgres -h chewy.somehost.com -p 5473 -d db_name
exit 0
Thanks!
You could use named pipes instead your anonymous pipe.
With this concept your python script could fill the tables through different psql processes with the corresponding data.
Create pipes:
mkfifo fifo_table1
mkfifo fifo_table2
Run psql instances:
psql db_name < fifo_table1 &
psql db_name < fifo_table2 &
Your python script would look about so (Pseudocode):
SQL_BEGIN = """
BEGIN;
TRUNCATE TABLE %s;
COPY %s FROM STDIN WITH NULL AS '';
"""
fifo1 = open('fifo_table1', 'w')
fifo2 = open('fifo_table2', 'w')
bigfile = open('mybigfile', 'r')
print >> fifo1, SQL_BEGIN % ('table1', 'table1') #ugly, with python2.6 you could use .format()-Syntax
print >> fifo2, SQL_BEGIN % ('table2', 'table2')
for line in bigfile:
# your code, which decides where the data belongs to
# if data belongs to table1
print >> fifo1, data
# else
print >> fifo2, data
print >> fifo1, 'COMMIT;'
print >> fifo2, 'COMMIT;'
fifo1.close()
fifo2.close()
Maybe this is not the most elegant solution, but it should work.
Why use COPY for the second table? I would assume that doing a:
INSERT INTO table2 (...)
SELECT ...
FROM table1;
would be faster than using COPY.
Edit
If you need to import different rows into different tables but from the same source file, maybe inserting everything into a staging table and then inserting the rows from there into the target tables is faster:
Import the .whole* text file into one staging table:
COPY staging_table FROM STDIN ...;
After that step, the whole input file is in staging_table
Then copy the rows from the staging table to the individual target tables by selecting only those that qualify for the corresponding table:
INSERT INTO table_1 (...)
SELECT ...
FROM staging_table
WHERE (conditions for table_1);
INSERT INTO table_2 (...)
SELECT ...
FROM staging_table
WHERE (conditions for table_2);
This is of course only feasible if you have enough space in your database to keep the staging table around.

Resources