How to enable Spark in Datastax Datacenter?

How to enable Spark in Datastax Datacenter? - solr

Our current Datastax datacenter setup contain 6 nodes in which both Solr and graph
enabled
root#ip-10-10-5-36:~# cat /etc/default/dse | grep -E 'SOLR_ENABLED|GRAPH_ENABLED'
GRAPH_ENABLED=1
SOLR_ENABLED=1
root#ip-10-10-5-36:~# nodetool status
Datacenter: SearchGraph
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.10.5.56 456.58 MiB 1 ? 936a1ac0-6d5e-4a94-8953-d5b5a2016b92 rack1
UN 10.10.5.46 406.24 MiB 1 ? 3f41dc2a-2672-47a1-90b5-a7c2bf17fb50 rack1
UN 10.10.5.76 392.99 MiB 1 ? 29f8fe44-3431-465e-b682-5d24e37d41d7 rack2
UN 10.10.5.66 414.16 MiB 1 ? 1f7de531-ff51-4581-bdb8-d9a686f1099e rack2
UN 10.10.5.86 424.3 MiB 1 ? 27d37833-56c8-44bd-bac0-7511b8bd74e8 rack2
UN 10.10.5.36 511.44 MiB 1 ? 0822145f-4225-4ad3-b2be-c995cc230830 rack1
We are planning to implement spark in our existing datacenter. My question is
1) Will enabling spark affect existing data and service in datastax ?.
2) Or instead of enabling SPARK_ENABLED=1, did we need to setup separate
datacenter for Spark ?
Updated :
3) How DC1 and DC2 connect each other in ring, is it based on same Cluster
name specified in cluster_name: parameter.
Conf file : /etc/dse/cassandra/cassandra.yaml
4) Is there any separate configuration need to specify spark master in data
center.
5) Did i need to specify SearchGraph (DC1) seed ip in Spark(DC2) seed
configuration section ? Or just Spark seed ip only need to specify in DC2
Configuration section(cassandra:yaml)

It's recommended to create separate datacenter for DSE Analytics. The full process is described in documentation.

to augment Alex's answer, this will depend if you'd like to execute Graph Analytics or not. What type of Spark work will be preformed when it's enabled?

Related

Mongodb Primary shard cluster down

just wanted to understand a scenario where if primary shard cluster down.
so i have a setup of Mongo database where i have 4 shards running in replicaset.
shard-1 == Server 1 (Primary), shard-1 Server 2 (Secondary), shard-1 - Server 3 (Secondary)
shard-2 == Server 4 (Primary), shard-2 - Server 5(Secondary), shard-2 - Server 6(Secondary)
shard-3 == Server 7 (Primary), shard-3 - Server 8(Secondary), shard-3 - Server 9(Secondary)
i have single database and single collection, so assuming that is distributed across all 3 shards as chunks and balancer is doing it's job right?
so in such case if shard-1(cluster) goes down, will traffic movement will be normal or will be hampered.

I guess you mix sharding and replication.
What do you mean by "if shard-1(cluster) goes down"? This means you lose 3 servers at once! Is this a probable situation you like investigate? Then I would say, you did a poor design of your cluster.
Sharding needs to be enabled on database level and on collection level. Based on the information you provided, no answer can be given.

well i was just making a scenario where complete cluster might go down, let's say, i have a cluster in single DC and that DC is down, so i might be in a situation where complete cluster is unavailable, other 2 cluster's are in different DC and so they are up and running well.
ok, let's try another way, If the primary replica set member of a primary shard goes down, elections will be held, the new primary will be announced and everything is back to normal right?.
but only in case if cluster have another available nodes, what if the complete cluster goes down?.
what is the use case of other 2 sharded cluster?.
or do i have wrong understanding of sharding?.
also, yes i have enabled sharding on my DB and collection both.

ok, why i was coming this way, let me tell you my actual problem here, i have 4 shards and all in replicaset.
when i check sh.status(), i saw below output
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: yes
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
7641 : Failed with error 'aborted', from MCA2 to MCA4
databases:
{ "_id" : "MCA", "primary" : "MCA2", "partitioned" : true, "version" : { "uuid" : UUID("xxxxxxxxxxxx"), "lastMod" : 1 } }
xxxxxxx
shard key: { "xxxxx" : "hashed" }
unique: false
balancing: true
chunks:
MCA 1658
MCA2 1692
MCA3 1675
MCA4 1670
so my simple question is if this Primary MCA2 shard goes down, what will happen, will collection(xxxxxx) is inaccessible by application or what?
also, as per terminology, i have 3 nodes cluster so anyone can go down and other becomes primary for traffic serve, so as long as any of the node is alive in my primary shard is alive and can server traffic to application right?.
if yes then let's say complete replicaset is down of primary shard MCA2, what now?
if no, then what will happen.
changed the actual value of collection and shard key for security reason to (xxxxxxx)

Oracle Database performance is slow

Overall Database performance is slow in on of our production environment.
Herewith I have attached the statspack reports of two time periods generated on 15/02/16 between 09.00AM - 02.00PM and 03.00PM - 07.00PM GMT .
DB details:
Oracle 11g 11.2.0.3.0 - Standard Edition
OS memory: 11.2GB
the current database SGA and PGA size is :
sga_max_size : 5G
sga_target : 5G
pga_aggregate_target : 1G
db_cache_size : 2080M
memory_max_target : 0
memory_target : 0
Please advice on this.
Ram

Run an AWR Report using dbms_workload_repository (for an html output use AWR_DIFF_REPORT_HTML function) or Oracle Enterprise Manager and check what's the things which are taking the most db time / cpu time / io ops etc.
There can be dozens of different causes for your not-that-specific issue.
Regarding the SGA/PGA specifically -
you can also just query gv$sga_target_advice, gv$pga_target_advice and check if there's a lack of memory in some of the pools (the more advanced and precise option is gv$sga_resize_ops).

cassandra connection error: Unable to connect to any servers

cassandra doesn't work for my VM.
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
when I use the command:
cassandra
......
INFO 07:55:31 Enqueuing flush of local: 578 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#2014850649(0.081KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-305-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=105487)
INFO 07:55:31 Enqueuing flush of local: 51468 (0%) on-heap, 0 (0%) off-heap
INFO 07:55:31 Writing Memtable-local#280469114(8.354KiB serialized bytes, 259 ops, 0%/0% of on/off-heap limit)
INFO 07:55:31 Completed flushing /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/tmp-la-306-big-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1448697324414, position=117466)
INFO 07:55:32 Node localhost/127.0.0.1 state jump to normal
INFO 07:55:32 Compacted (64dd8610-95a5-11e5-af1d-a752adc4283f) 4 sstables to [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la-91-big,] to level=0. 20,658 bytes to 20,029 (~96% of original) in 2,376ms = 0.008039MB/s. 0 total partitions merged to 225. Partition merge counts were {1:225, }
then cqlsh can work:
cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Use HELP for help.
cqlsh>
but a few minutes later, the cqlsh is down:
cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})
Anyone can help me! Thanks in advance!

Sound like the server is going down after a few minutes. You should check the logs for the reason

I found the root cause: the memory is not enough. I create the linux swap, then everything is ok.
how to add swap on ubuntu

Disabling virtual nodes in an existing Solr DC

I have an existing cluster with the following topology:
DC Cassandra: 2 nodes
DC Solr: 5 nodes
All of the nodes currently use vnodes. I want to disable vnodes in the Solr DC for performance reasons.
According to this document, to disable vnodes:
In the cassandra.yaml file, set num_tokens to 1
Uncomment the initial_token property and set it to 1 or to the value of a generated token for a multi-node cluster.
Is this all that I need to do? (no repair, no cleanup, no anything?) Seems too good to be true for me.
As for token assignment, should I use the python code found here (for Murmur3) or should I reuse one of the existing tokens from the vnodes that the node currently has?

The only way to disable vnodes is to do: http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html
in the reverse. Make a new Solr dc with vnodes off and switch over to it.

Find long running query on Informix?

How can you find out what are the long running queries are on Informix database server? I have a query that is using up the CPU and want to find out what the query is.

If the query is currently running watch the onstat -g act -r 1 output and look for items with an rstcb that is not 0
Running threads:
tid tcb rstcb prty status vp-class name
106 c0000000d4860950 0 2 running 107soc soctcppoll
107 c0000000d4881950 0 2 running 108soc soctcppoll
564457 c0000000d7f28250 c0000000d7afcf20 2 running 1cpu CDRD_10
In this example the third row is what is currently running. If you have multiple rows with non-zero rstcb values then watch for a bit looking for the one that is always or almost always there. That is most likely the session that your looking for.
c0000000d7afcf20 is the address that we're interested in for this example.
Use onstat -u | grep c0000000d7afcf20 to find the session
c0000000d7afcf20 Y--P--- 22887 informix - c0000000d5b0abd0 0 5 14060 3811
This gives you the session id which in our example is 22887. Use onstat -g ses 22887
to list info about that session. In my example it's a system session so there's nothing to see in the onstat -g ses output.

That's because the suggested answer is for DB2, not Informix.
The sysmaster database (a virtual relational database of Informix shared memory) will probably contain the information you seek. These pages might help you get started:
http://docs.rinet.ru/InforSmes/ch22/ch22.htm
http://www.informix.com.ua/articles/sysmast/sysmast.htm

Okay it took me a bit to work out how to connect to sysmaster. The JDBC connection string is:
jdbc:informix-sqli://dbserver.local:1526/sysmaster:INFORMIXSERVER=mydatabase
Where the port number is the same as when you are connecting to the actual database. That is if your connection string is:
jdbc:informix-sqli://database:1541/crm:INFORMIXSERVER=crmlive
Then the sysmaster connection string is:
jdbc:informix-sqli://database:1541/sysmaster:INFORMIXSERVER=crmlive
Also found this wiki page that contains a number of SQL queries for operating on the sysmaster tables.

SELECT ELAPSED_TIME_MIN,SUBSTR(AUTHID,1,10) AS AUTH_ID,
AGENT_ID, APPL_STATUS,SUBSTR(STMT_TEXT,1,20) AS SQL_TEXT
FROM SYSIBMADM.LONG_RUNNING_SQL
WHERE ELAPSED_TIME_MIN > 0
ORDER BY ELAPSED_TIME_MIN DESC
Credit: SQL to View Long Running Queries

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to enable Spark in Datastax Datacenter? - solr

It's recommended to create separate datacenter for DSE Analytics. The full process is described in documentation.

to augment Alex's answer, this will depend if you'd like to execute Graph Analytics or not. What type of Spark work will be preformed when it's enabled?

Related

Mongodb Primary shard cluster down

Oracle Database performance is slow

cassandra connection error: Unable to connect to any servers

Disabling virtual nodes in an existing Solr DC

Find long running query on Informix?

Categories

Resources