How to configure a Zepplin notebook to use only X cores in my local Spark Cluster? - apache-zeppelin

Config - Ubuntu; Apache Zepplin (7.3.0); Spark 2.2.0; Hadoop 2.6;
A cluster of 6 machines with 14gb ram & 4 cores each. Need to split these for two notebooks. Please advice.

In case of 7.3.0, check-in Zepplin Interpreter settings.
A dropdown option is available select the SparkContext as Global or per Note.

Related

Connection drop from postgresql on azure virtual machine

I am a bit new to postgresql db. I have done a setup over Azure Cloud for my PostgreSQL DB.
It's Ubuntu 18.04 LTS (4vCPU, 8GB RAM) machine with PostgreSQL 9.6 version.
The problem that occurs is when the connection to the PostgreSQL DB stays idle for some time let's say 2 to 10 minutes then the connection to the db does not respond such that it doesn't fulfill the request and keep processing the query.
Same goes with my JAVA Spring-boot Application. The connection doesn't respond and the query keep processing.
This happens randomly such that the timing is not traceable sometimes it happens in 2 minutes, sometimes in 10 minutes & sometimes don't.
I have tried with PostgreSQL Configuration file parameters. I have tried:
tcp_keepalive_idle, tcp_keepalive_interval, tcp_keepalive_count.
Also statement_timeout & session_timeout parameters but it doesn't change anyway.
Any suggestion or help would be appreciable.
Thank You
If you are setting up PostgreSQL DB connection on Azure VM you have to be aware that there are Unbound and Outbound connections timeouts . According to
https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-outbound-connections#idletimeout ,Outbound connections have a 4-minute idle timeout. This timeout is not adjustable. For inbound timeou there is an option to change in on Azure Portal.
We run into similar issue and were able to resolve it on client side. We changed Spring-boot default Hikari configuration as follow:
hikari:
connection-timeout: 20000
validation-timeout: 20000
idle-timeout: 30000
max-lifetime: 40000
minimum-idle: 1
maximum-pool-size: 3
connection-test-query: SELECT 1
connection-init-sql: SELECT 1

Configuring Ports for Flink Job/Task Manager Metrics

I am running Flink in Amazon EMR. In flink-conf.yaml, I have metrics.reporter.prom.port: 9249-9250
Depending whether the job manager and task manager are running in the same node, the task manager metrics are reported on port 9250 (if running on same node as job manager), or on port 9249 (if running on a different node).
Is there a way to configure so that the task manager metrics are always reported on port 9250?
I saw a post that we can "provide each *Manager with a separate configuration." How to do that?
Thanks
You can configure different ports for the JM and TM by starting the processes with differently configured flink-conf.yaml.
On Yarn, Flink currently uses the same flink-conf.yaml for all processes.

Change Lucidworks Fusion 3.15 default cluster

I have installed lucidworks fusion 3.1.5 server in my computer. As you know default solr version for this fusion is 6.6, but I have already configured existing zookeeper cluster with solr v7.2.1. After a lot of researches I have managed to connect the existing solr 7.2 cluster with Fusion 3.1.5, But when fusion creates signal, connectors and other cors for my solr 7.2 collections it save in the default fusion cluster(solr 6.6).
How can I change this behavior? for example: I need to save signals, logs and other cores which are connected with solr 7.2 in solr 7.2. Is there a way to change these fusion settings.
Thank You.
You can do this by pointing your fusion.properties configuration to your ZK and Solr instances.
default.zk.connect = 192.168.1.1:2181, 192.168.1.2:2181, 192.168.1.3:2181, 192.168.1.4:2181, 192.168.1.5:2181
default.solrZk.connect = 192.168.1.1:2181, 192.168.1.2:2181, 192.168.1.3:2181, 192.168.1.4:2181, 192.168.1.5:2181
Obviously, adjust the IPs/ports for the Solr and ZK server nodes.
Then make sure that group.default does not have solr and zookeeper listed as services to load.
group.default = api, connectors, ui
I would suggest to delete your instance and start fresh by exploding the archive, configuring fusion.properties and then starting up the instance.

Solr Configuration for Hortontworks HA

I'm implementing the Hortonworks Standby NameNode (High Availibility) and i'm wondering how to configure the Solr to point to the cluster name instead of the Name node Hostname? As the name node might change in case of the failover.
str name=solr.hdfs.home: ??????
I tried to configure Dolr in several ways without succes:
1) Using the cluser name
2) using a "," separate host name of the both active and standby NameNode
3) using a ";" separate host name of the both active and standby NameNode
Do you have any suggestion?
Thanks
Regards
Farhad
You need to configure the cluster name instead of single namenode's FQDN. Cluster name must be defined during the creation of HA cluster, the same clustername should be given for solr.hdfs.home to achieve HA. Also hadoop client configurations(hdfs-site.xml,core-site.xml) should be available with solr running machine. This directory (where *site.xml file resides) should point to hadoop.home system property.
You need to add the hdfs configuration dir -Dsolr.hdfs.confdir={hadoop_conf} at startup. In Hortonworks is usually /etc/hadoop/conf

What is the port number for cassandra and solr?

I read the documentation which says 7199 is JMX port number and 8983 is solr port number and 9160 is cassandra client port number. But if i start
dse cassandra -s
starts solr. If i start cassandra-client in the same machine
dse cassandra -f
It says
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use
So I understand that both tries to use same JMX port number.
Is there any way to specify two port numbers one for solr or one for cassandra OR is there any way to start both in the same machine.
I am using datastax 2.2.2 tarball set up.
Any ideas?
You only need to start dse one time. It runs search and c* in the same jvm and serves in all the ports you mentioned above.
As you mention above. Use this command for a tarball install to start dse in search mode. Do this accross your cluster (rolling restart, no downtime required):
bin/dse cassandra -s

Resources