Is there any way to specify clickhouse-client timeout? - database

I'm inserting a lot of CSV's into Clickhouse database. Sometimes it stucks on one of the files or something is wrong with the remote server I'm inserting to so it waits default amount of time and then outputs that Code: 209. DB::NetException: Timeout exceeded while reading from socket (ip, 300000 ms): while receiving packet from ip:9000: (in query: ...). (SOCKET_TIMEOUT)
Is there any way to specify this timeout so I don't need to wait for 5 minutes? I'm inserting with the script like this:
clickhouse-client --host "Host" --database "db" --port 9000 --user "User" --password "Password" --query "INSERT INTO table FORMAT CSV" < "file.csv"

You can try --receive_timeout.
There are bunch of timeout options available at clickhouse-client. Check this command:
clickhouse-client --help | grep timeout

Related

Clickhouse-client insert optimization

I'm inserting a lot of CSV data files into remote Clickhouse database that already has a lot of data. I'm doing it using simple script like this:
...
for j in *.csv; do
clickhouse-client --max_insert_threads=32 --receive_timeout=30000 --input_format_allow_errors_num=999999999 --host "..." --database "..." --port 9000 --user "..." --password "..." --query "INSERT INTO ... FORMAT CSV" < "$j"
done
...
So my question is: how to optimize these inserts? I already used these options for optimization:
--max_insert_threads=32 --receive_timeout=30000
Are there any more options in clickhouse-client I should use for better performance and for what purpose? One file can be like 300-500mb (and sometimes more). According to this article using parallel processes won't help that's why I'm inserting one file at time.
max_insert_threads is not applicable here, it's about insert select inside CH server.
According to this article using parallel processes won't help
It should help (it depends on CPU and disk power), just try
# parallelism 6 (-P6)
find . -type f -name '*.csv' | xargs -P 6 -n 1 clickhouse-client --input_format_parallel_parsing=0 --receive_timeout=30000 --input_format_allow_errors_num=999999999 --host "..." --database "..." --port 9000 --user "..." --password "..." --query "INSERT INTO ... FORMAT CSV"
I set input_format_parallel_parsing=0 deliberately, it improves total performance in case of multiple parallel loads.

Check database connectivity

I'm writing a unix script to check for database connectivity in a server. When my database connection gets errored out or when there is delay observed in connecting to the database, I want the output as "Not connected". In case it gets connected, my output should be "Connected". It is a Oracle databse.
When there is delay in database connectivity, my code is not working and my script gets hung. What changes should I make in my code so that it is able to handle both the conditions(when I get an error connecting to the database and when there is delay observed in connecting to the database)??
if sqlplus $DB_USER/$DB_PASS#$DB_INSTANCE< /dev/null | grep 'Connected to'; then
echo "Connectivity is OK"
else
echo "No Connectivity"
fi
The first thing to add to your code is a timeout. Checking database connectivity is not easy and there can be all kinds of problems in the various layers that your connection passes. A timeout gives you the option to break out of a hanging session and continue the task with reporting that the connection failed.
googleFu gave me a few nice examples:
Timeout a command in bash without unnecessary delay
If you are using Linux, you can use the timeout command to do what you want. So the following will have three outcomes, setting the variable RC as follows:
"Connected to" successful: RC set to 0
"Connected to" not found: RC set to 1
sqlplus command timed out after 5 minutes: RC set to 124
WAIT_MINUTES=5
SP_OUTPUT=$(timeout ${WAIT_MINUTES}m sqlplus $DB_USER/$DB_PASS#$DB_INSTANCE < /dev/null )
CMD_RC=$?
if [ $CMD_RC -eq 124 ]
then
ERR_MSG="Connection attempt timed out after $WAIT_MINUES minutes"
RC=$CMD_RC
else
echo $SP_OUTPUT | grep -q 'Connected to'
GREP_RC=$?
if [ $GREP_RC -eq 0 ]
then
echo "Connectivity is OK"
RC=0
else
ERR_MSG="Connectivity or user information is bad"
RC=1
fi
fi
if [ $RC -gt 0 ]
then
# Add code to send email with subject of $ERR_MSG and body of $SP_OUTPUT
echo Need to email someone about $ERR_MSG
fi
exit $RC
I'm sure there are several improvements to this, but this will get you started.
Briefly, we use the timeout command to wait the specified time for the sqlplus command to run. I separated out the grep as a separate command to allow the use of timeout and to allow more flexibility in checking additional text messages.
There are several examples on StackOverflow on sending email from a Linux script.

How to solve MongoDB error "Too many open files" without root permission

Trying to insert several JSON files to MongoDB collections using shell script as following,
#!/bin/bash
NUM=50000
for ((i=o;i<NUM;i++))
do
mongoimport --host localhost --port 27018 -u 'admin' -p 'password' --authenticationDatabase 'admin' -d random_test -c tri_${i} /home/test/json_files/json_${i}.csv --jsonArray
done
after several successful adding, these errors were shown on terminal
Failed: connection(localhost:27017[-3]), incomplete read of message header: EOF
error connecting to host: could not connect to server:
server selection error: server selection timeout,
current topology: { Type: Single, Servers:
[{ Addr: localhost:27017, Type: Unknown,
State: Connected, Average RTT: 0, Last error: connection() :
dial tcp [::1]:27017: connect: connection refused }, ] }
And below the eoor messages from mongo.log, that said too many open files, can I somehow limit the thread number? or what should I do to fix it?? Thanks a lot!
2020-07-21T11:13:33.613+0200 E STORAGE [conn971] WiredTiger error (24) [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files Raw: [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files
2020-07-21T11:13:33.613+0200 E STORAGE [conn971] WiredTiger error (-31804) [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
2020-07-21T11:13:33.613+0200 F - [conn971] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414
2020-07-21T11:13:33.613+0200 F - [conn971]
***aborting after fassert() failure
checking the open file limit by ulimit -n shows 1024. Then I tried to alter the limit by ulimit -n 50000, but the accout that I used for the remote server doesn't have permissions to do that, can I somehow close the file once the importing is done or is there any other way to alter the open file limit without root permission needed? Thanks a lot!
Env: Redhat, mongoDB
You can't. The reason why resource limits exist is to limit how much resources non-privileged users (which yours is) can consume. You need to reconfigure the system to adjust this which requires root privileges.

What's the recommended way to use pg_dump/ysql_dump with YugabyteDB to export data when a table is still receiving inserts?

When this workload (SqlSecondaryIndex workload from https://github.com/YugaByte/yb-sample-apps/) is still running
% java -jar yb-sample-apps.jar --workload SqlSecondaryIndex --nodes
127.0.0.1:5433 --num_threads_read 4 --num_threads_write 2
an attempt to use ysql_dump to export the table causes "Query error: Restart read required" error.
$ ./ysql_dump -h 127.0.0.1 -d postgres --data-only --table sqlsecondaryindex -f out.txt
ysql_dump: Dumping the contents of table "sqlsecondaryindex" failed: PQgetResult() failed.
ysql_dump: Error message from server: ERROR: Query error: Restart read required at: { read: { physical: 1592265362684030 } local_limit: { physical: 1592265375906038 } global_limit: <min> in_txn_limit: <max> serial_no: 0 }
But if the same command is executed when the workload is stopped, then ysql_dump command completes successfully without any issues. Is this expected behavior?
To read against a consistent snapshot and avoid running into the "read restart" error, pass the --serializable-deferrable option to ysql_dump. For example:
~/tserver/postgres/bin/ysql_dump -h 127.0.0.1 -d postgres \
--data-only --table sqlsecondaryindex \
--serializable-deferrable -f data1.csv

PostgreSQL \lo_import and how to get the resulting OID into an UPDATE command?

I'm working with Postgres 9.0, and I have an application where I need to insert images into the remote server. So I use:
"C:\Program Files\PostgreSQL\9.0\bin\psql.exe" -h 192.168.1.12 -p 5432 -d myDB -U my_admin -c "\lo_import 'C://im/zzz4.jpg'";
where
192.168.1.12 is the IP address of the server system
5432 is the Port number
myDB is server database name
my_admin is the username
"\lo_import 'C://im/zzz4.jpg'" is the query that is fired.
After the image has been inserted into the database I need to update a row in a table like this:
UPDATE species
SET speciesimages=17755; -- OID from previous command.. how to get the OID ??
WHERE species='ACCOAA';
So my question is: how do I get the OID returned after the \lo_import in psql?
I tried running \lo_import 'C://im/zzz4.jpg' in Postgres but I get an error:
ERROR: syntax error at or near ""\lo_import 'C://im/zzz4.jpg'""
LINE 1: "\lo_import 'C://im/zzz4.jpg'"
I also tried this:
update species
set speciesimages=\lo_import 'C://im/zzz4.jpg'
where species='ACAAC04';
But I get this error:
ERROR: syntax error at or near "\"
LINE 2: set speciesimages=\lo_import 'C://im/zzz4.jpg'
^
As your file resides on your local machine and you want to import the blob to a remote server, you have two options:
1) Transfer the file to the server and use the server-side function:
UPDATE species
SET speciesimages = lo_import('/path/to/server-local/file/zzz4.jpg')
WHERE species = 'ACAAC04';
2) Use the psql meta-command like you have it.
But you cannot mix psql meta commands with SQL-commands, that's impossible.
Use the psql variable :LASTOID in an UPDATE command that you launch immediately after the \lo_import meta command in the same psql session:
UPDATE species
SET speciesimages = :LASTOID
WHERE species = 'ACAAC04';
To script that (works in Linux, I am not familiar with Windows shell scripting):
echo "\lo_import '/path/to/my/file/zzz4.jpg' \\\\ UPDATE species SET speciesimages = :LASTOID WHERE species = 'ACAAC04';" | \
psql -h 192.168.1.12 -p 5432 -d myDB -U my_admin
\\ is the separator meta-command. You need to double the \, in a "" string, because the shell interprets one layer.
\ before the newline is just the line continuation in Linux shells.
Alternative syntax (tested on Linux again):
psql -h 192.168.1.12 -p 5432 -d myDB -U my_admin << EOF
\lo_import '/path/to/my/file/zzz4.jpg'
UPDATE species
SET speciesimages = :LASTOID
WHERE species = 'ACAAC04';
EOF
After importing an image with this command:
\lo_import '$imagePath' '$imageName'
You can then find the description of the binary by querying the pg_catalog.pg_largeobject_metadata table which stores the oid value you need.
Ie:
"SELECT oid as `"ID`",
pg_catalog.obj_description(oid, 'pg_largeobject') as `"Description`"
FROM pg_catalog.pg_largeobject_metadata WHERE pg_catalog.obj_description(oid,'pg_largeobject') = '$image' limit 1 "
Here's how to do it if your field is type bytea.
\lo_import '/cygdrive/c/Users/Chloe/Downloads/Contract.pdf'
update contracts set contract = lo_get(:LASTOID) where id = 77;
Use \lo_list and \lo_unlink after you import.

Resources