clickhouse - Clickhouse imported data is forcibly killed by the system - database

When I import a 120g text file to the Clickhouse, there are 400 million data in it. After importing more than 100 million data, I will be killed.
The import statement is as follows:
clickhouse-client --user default --password xxxxx --port 9000 -hbd4 --database="dbs" --input_format_allow_errors_ratio=0.1 --query="insert into ... FORMAT CSV" < /1.csv
The error is as follows:
2021.04.29 10:20:23.135790 [ 19694 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).
Is the imported file too large, bursting the memory? Should I subdivide the file again?

take a look at system logs - they should have some clues:
as suggested in the error message - run dmesg and see if there's mention of OOM Killer [ kernel self-protection mechanism triggering on out-of-memory events ]. if that's the case - you're out of memory or you've granted too much memory to clickhouse.
see what clickhouse own logs tell. path to the log file is defined in clickhouse-server/config.xml, under yandex/logger/log - it's likely /var/log/clickhouse-server/clickhouse-server.log + /var/log/clickhouse-server/clickhouse-server.err.log

Related

How to track Icecast2 visits with Matomo?

My beloved web radio has an icecast2 instance and it just works. We have also a Matomo instance to track visits on our WordPress website, using only Free/Libre and open source software.
The main issue is that, since Matomo tracks visits via JavaScript, direct visits to the web-radio stream are not intercepted by Matomo as default.
How to use Matomo to track visits to Icecast2 audio streams?
Yep it's possible. Here my way.
First of all, try the Matomo internal import script. Be sure to set your --idsite= and the correct path to your Matomo installation:
su www-data -s /bin/bash
python2.7 /var/www/matomo/misc/log-analytics/import_logs.py --show-progress --url=https://matomo.example.com --idsite=1 --recorders=2 --enable-http-errors --log-format-name=icecast2 --strip-query-string /var/log/icecast2/access.log
NOTE: If you see this error
[INFO] Error when connecting to Matomo: HTTP Error 400: Bad Request
In this case, be sure to have all needed plugins activated:
Administration > System > Plugins > Bulk plugin
So, if the script works, it should start printing something like this:
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/log/icecast2/access.log...
1013 lines parsed, 200 lines recorded, 99 records/sec (avg), 200 records/sec (current)
If so, immediately stop the script to avoid to import duplicate entries before installing the definitive solution.
To stop the script use CTRL+C.
Now we need to run this script every time the log is rotated, before rotation.
The official documentation suggests a crontab but I don't recommend this solution. Instead, I suggest to configure logrotate instead.
Configure the file /etc/logrotate.d/icecast2. From:
/var/log/icecast2/*.log {
...
weekly
...
}
To:
/var/log/icecast2/*.log {
...
daily
prerotate
su www-data -s /bin/bash --command 'python2.7 ... /var/log/icecast2/access.log' > /var/log/logrotate-icecast2-matomo.log
endscript
...
}
IMPORTANT: In the above example replace ... with the right command.
Now you can also try it manually:
logrotate -vf /etc/logrotate.d/icecast2
From another terminal you should be able to see its result in real-time with:
tail -f /var/log/logrotate-icecast2-matomo.log
If it works it means everything will work perfectly and automatically, importing all visits every day, without any duplicate and without missing any lines.
More documentation here about the import script itself:
https://github.com/matomo-org/matomo-log-analytics
More documentation here about logrotate:
https://linux.die.net/man/8/logrotate

Clickhouse - OOM - When the function of Clickhouse data import is executed, the program is killed by the system

When I was performing the Clickhouse data import function, the program was killed by the system.
The execution command is as follows:
clickhouse-client --user default --password xxxxx --port 9000 -hxxx --database="bds" --input_format_allow_errors_ratio=0.1 --query="insert into ... FORMAT CSV" < /1.csv
The error reported is:
[2478416.927226] Out of memory: Kill process 19696 (clickhouse-serv) score 12 or sacrifice child [2478416.928855] Killed process 19696 (clickhouse-serv), UID 0, total-vm:46668916kB, anon-rss:1008480kB, file-rss:0kB, shmem-rss:0kB
Data file has 122g, server memory 128G, how to let the Clickhouse command to reasonable import it, I hope you can help me, thank you!
The most probably it's a bug. What CH version do you use?
Try to add parameters --input_format_parallel_parsing=0 --optimize_on_insert=0
--input_format_parallel_parsing arg Enable parallel parsing for some data formats.
--optimize_on_insert arg Do the same transformation for inserted block of data as if merge was done on this block.

How to solve MongoDB error "Too many open files" without root permission

Trying to insert several JSON files to MongoDB collections using shell script as following,
#!/bin/bash
NUM=50000
for ((i=o;i<NUM;i++))
do
mongoimport --host localhost --port 27018 -u 'admin' -p 'password' --authenticationDatabase 'admin' -d random_test -c tri_${i} /home/test/json_files/json_${i}.csv --jsonArray
done
after several successful adding, these errors were shown on terminal
Failed: connection(localhost:27017[-3]), incomplete read of message header: EOF
error connecting to host: could not connect to server:
server selection error: server selection timeout,
current topology: { Type: Single, Servers:
[{ Addr: localhost:27017, Type: Unknown,
State: Connected, Average RTT: 0, Last error: connection() :
dial tcp [::1]:27017: connect: connection refused }, ] }
And below the eoor messages from mongo.log, that said too many open files, can I somehow limit the thread number? or what should I do to fix it?? Thanks a lot!
2020-07-21T11:13:33.613+0200 E STORAGE [conn971] WiredTiger error (24) [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files Raw: [1595322813:613873][53971:0x7f7c8d228700], WT_SESSION.create: __posix_directory_sync, 151: /home/mongodb/bin/data/db/index-969--7295385362343345274.wt: directory-sync: Too many open files
2020-07-21T11:13:33.613+0200 E STORAGE [conn971] WiredTiger error (-31804) [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1595322813:613892][53971:0x7f7c8d228700], WT_SESSION.create: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
2020-07-21T11:13:33.613+0200 F - [conn971] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414
2020-07-21T11:13:33.613+0200 F - [conn971]
***aborting after fassert() failure
checking the open file limit by ulimit -n shows 1024. Then I tried to alter the limit by ulimit -n 50000, but the accout that I used for the remote server doesn't have permissions to do that, can I somehow close the file once the importing is done or is there any other way to alter the open file limit without root permission needed? Thanks a lot!
Env: Redhat, mongoDB
You can't. The reason why resource limits exist is to limit how much resources non-privileged users (which yours is) can consume. You need to reconfigure the system to adjust this which requires root privileges.

libssh2 - channel read is hanging

I'm currently developing a remote job scheduler on perl.
It has to connect via ssh to x servers and execute already defined jobs/jobs groups.
I use Net:SSH2 which is build upon libssh2.
My program usually works fine with like 400/500 servers, but when i try to run the basic uptime command on 1000 servers, one or more of my threads hangs and never finishes, or like 30 minutes after.
It's random : sometimes it finishes on time, sometimes not.
I tracked the problem as coming from this Net::SSH2 command : $in .= $buf while $chan->read( $buf, 10240 );
Here is the full code of the connection :
my $chan = $this->{netssh2}->channel() or die $!;
$chan->blocking(1);
$chan->exec($command);
my ($in,$err,$buf,$buf_err);
$in .= $buf while $chan->read( $buf, 10240 );
$err .= $buf_err while $chan->read( $buf_err, 10240, 1 );
$chan->send_eof;
1 while !$chan->eof;
$chan->wait_closed;
I then downloaded a Net::SSH2 source package and modified the C-perl linking (xs) file.
It showed me that the problem comes from this line :
count = libssh2_channel_read_ex(ch->channel, XLATEXT, pv_buffer, size);
This command comes with the libssh2 library : http://www.libssh2.org/libssh2_channel_read_ex.html
Sometimes (about 1 in 1000 times) the program enters this read and never leaves. Servers affected are differents most of the time.
Do you have any idea what I should be looking for/checking ?
I've been working on this for a few day, I'd like an external advice very much :)

Invalid value "zookeeper" for flag -a: valid streams are STDIN, STDOUT and STDERR

I am trying to follow this blog to setup solr cloud with docker:
https://lucidworks.com/blog/solrcloud-on-docker/
I was able to create the zookeeper image successfully. docker images command lists the image too.
However, when I try to create and run the zookeeper container with the following command, it errors out:
docker run -name zookeeper -p 2181 -p 2888 -p 3888 myusername/zookeeper:3.4.6
Error:
Warning: '-n' is deprecated, it will be removed soon. See usage.
invalid value "zookeeper" for flag -a: valid streams are STDIN, STDOUT and STDERR
See 'docker run --help'.
flag provided but not defined: -name
See 'docker run --help'.
What am I missing here?
Please use --name instead.
Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Run a command in a new container
-a, --attach=[] Attach to STDIN, STDOUT or STDERR
--add-host=[] Add a custom host-to-IP mapping (host:ip)
--blkio-weight=0 Block IO weight (relative weight)
-c, --cpu-shares=0 CPU shares (relative weight)
--cap-add=[] Add Linux capabilities
--cap-drop=[] Drop Linux capabilities
--cgroup-parent="" Optional parent cgroup for the container
--cidfile="" Write the container ID to the file
--cpu-period=0 Limit CPU CFS (Completely Fair Scheduler) period
--cpu-quota=0 Limit CPU CFS (Completely Fair Scheduler) quota
--cpuset-cpus="" CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems="" Memory nodes (MEMs) in which to allow execution (0-3, 0,1)
-d, --detach=false Run container in background and print container ID
--device=[] Add a host device to the container
--dns=[] Set custom DNS servers
--dns-search=[] Set custom DNS search domains
-e, --env=[] Set environment variables
--entrypoint="" Overwrite the default ENTRYPOINT of the image
--env-file=[] Read in a file of environment variables
--expose=[] Expose a port or a range of ports
--group-add=[] Add additional groups to run as
-h, --hostname="" Container host name
--help=false Print usage
-i, --interactive=false Keep STDIN open even if not attached
--ipc="" IPC namespace to use
-l, --label=[] Set metadata on the container (e.g., --label=com.example.key=value)
--label-file=[] Read in a file of labels (EOL delimited)
--link=[] Add link to another container
--log-driver="" Logging driver for container
--log-opt=[] Log driver specific options
--lxc-conf=[] Add custom lxc options
-m, --memory="" Memory limit
--mac-address="" Container MAC address (e.g. 92:d0:c6:0a:29:33)
--memory-swap="" Total memory (memory + swap), '-1' to disable swap
--memory-swappiness="" Tune a container's memory swappiness behavior. Accepts an integer between 0 and 100.
--name="" Assign a name to the container
--net="bridge" Set the Network mode for the container
--oom-kill-disable=false Whether to disable OOM Killer for the container or not
-P, --publish-all=false Publish all exposed ports to random ports
-p, --publish=[] Publish a container's port(s) to the host
--pid="" PID namespace to use
--privileged=false Give extended privileges to this container
--read-only=false Mount the container's root filesystem as read only
--restart="no" Restart policy (no, on-failure[:max-retry], always)
--rm=false Automatically remove the container when it exits
--security-opt=[] Security Options
--sig-proxy=true Proxy received signals to the process
-t, --tty=false Allocate a pseudo-TTY
-u, --user="" Username or UID (format: <name|uid>[:<group|gid>])
--ulimit=[] Ulimit options
--disable-content-trust=true Skip image verification
--uts="" UTS namespace to use
-v, --volume=[] Bind mount a volume
--volumes-from=[] Mount volumes from the specified container(s)
-w, --workdir="" Working directory inside the container

Resources