Connecting to apache atlas + hbase + solr setup with gremlin cli - solr

I am new to atlas and janusgraph, I have a local setup of atlas with hbase and solr as the backends with dummy data.
I would like to use gremlin cli + gremlin server and connect to the existing data in hbase. ie: view and traverse the dummy atlas metadata objects.
This is what I have done so far:
Run atlas server + hbase + solr - inserted dummy entities
Run gremlin server with the right configuration
I have set the graph: { ConfigurationManagementGraph: ..} to janusgraph-hbase-solr.properties
Run gremlin cli, connect with :remote connect tinkerpop.server conf/remote.yaml session which connects the gremlin server just fine.
I do graph = JanusGraphFactory.open(..../janusgraph-hbase-solr.properties) and create g = graph.traversal()
I am able to create my own vertex and edges and list them, but not able to list anything related to atlas ie: entities etc.
What am I missing?
I want to connect to existing atlas setup and traverse the graph with gremlin cli.
Thanks

To be able to access Atlas artifacts from gremlin cli you will have to add Atlas dependency jars to Janusgraph's lib directory.
You can get the jars from Atlas maven repo or from your local build.
$ cp atlas-* janusgraph-0.3.1-hadoop2/lib/
list of JARs
atlas-common-1.1.0.jar
atlas-graphdb-api-1.1.0.jar
atlas-graphdb-common-1.1.0.jar
atlas-graphdb-janus-1.1.0.jar
atlas-intg-1.1.0.jar
atlas-repository-1.1.0.jar
A sample query could be:
gremlin> :> g.V().has('__typeName','hive_table').count()
==>10

As ThiagoAlvez mentioned, Atlas docker image can be used since Tinknerpop Gremlin support is now build-in into it and can be easily used to play with Janusgraph, and Atlas artifacts using gremlin CLI:
Pull the image:
docker pull sburn/apache-atlas
Start Apache Atlas in a container exposing Web-UI port 21000:
docker run -d \
-p 21000:21000 \
--name atlas \
sburn/apache-atlas \
/opt/apache-atlas-2.1.0/bin/atlas_start.py
Install gremlin-server and gremlin-console into the container by running included automation script:
docker exec -ti atlas /opt/gremlin/install-gremlin.sh
Start gremlin-server in the same container:
docker exec -d atlas /opt/gremlin/start-gremlin-server.sh
Finally, run gremlin-console interactively:
docker exec -ti atlas /opt/gremlin/run-gremlin-console.sh

I had this same issue when trying to connect to the Apache Atlas JanusGraph database (org.janusgraph.diskstorage.solr.Solr6Index).
I got it solved after moving the atlas jars to the JanusGraph lib folder as anand said and then configuring janusgraph-hbase-solr.properties.
These are the configurations that set on janusgraph-hbase-solr.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=localhost
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
index.search.backend=solr
index.search.solr.mode=http
index.search.solr.http-urls=http://localhost:9838/solr
index.search.solr.zookeeper-url=localhost:2181
index.search.solr.configset=_default
atlas.graph.storage.hbase.table=apache_atlas_janus
storage.hbase.table=apache_atlas_janus
I'm running Atlas using this docker image: https://github.com/sburn/docker-apache-atlas

Related

How to sink message to InfluxDB using PyFlink?

I am trying to run PyFlink walkthough, but instead of sinking data to Elasticsearch, i want to use InfluxDB.
Note: the code in walkthough (link above) is working as expected.
In order for this to work, we need to put InfluxDB connector inside docker container.
The other Flink connectors are placed inside container with these commands in Dockerfile:
# Download connector libraries
RUN wget -P /opt/flink/lib/ https://repo.maven.apache.org/maven2/org/apache/flink/flink-json/${FLINK_VERSION}/flink-json-${FLINK_VERSION}.jar; \
wget -P /opt/flink/lib/ https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.12/${FLINK_VERSION}/flink-sql-connector-kafka_2.12-${FLINK_VERSION}.jar; \
wget -P /opt/flink/lib/ https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-elasticsearch7_2.12/${FLINK_VERSION}/flink-sql-connector-elasticsearch7_2.12-${FLINK_VERSION}.jar;
I need help in order to:
Put an InfluxDB connector into container
Modify the CREATE TABLE statement below, in order to work for InfluxDB
CREATE TABLE es_sink (
id VARCHAR,
value DOUBLE
) with (
'connector' = 'elasticsearch-7',
'hosts' = 'http://elasticsearch:9200',
'index' = 'platform_measurements_1',
'format' = 'json'
)
From the documentation:
Table and SQL APIs currently (14/06/2022) does not support InfluxDB - a sql/table connector does not exist.
Here are the known connectors that you can use:
From Maven Apache Flink
From Apache Bahir
You can:
Use Flink streaming connector for InfluxDB from Apache Bahir (only DataStream API)
or
Implement your own sink

Configure Postgre DB in docker project

I'm very new to docker and docker-compose things. I have tried to use Kiwi TCMS open source project which supposed to use with Docker.
My question is can I run the projects on docker on the same server, I'm supposed to configure my development and production sites on the same server (CentOS)?
I'm following the below link to install docker and configure the kiwitcms application for the first time, I read the basics about the docker and how its working and all.
https://kiwitcms.readthedocs.io/en/latest/installing_docker.html#
I want to use PostgreSQL as my Database, but the existing latest docker image has MariaDB. So after I pulled latest version of kiwitcms from docker hub using the following command,
docker pull kiwitcms/kiwi
Should I change to "docker-compose.yml" file db image value, and save it to a local new directory,
https://raw.githubusercontent.com/kiwitcms/Kiwi/master/docker-compose.yml
[![enter image description here][1]][1]
Also there can we edit use our own DB name, user name passwords right?
then execute the following command.
docker-compose up -d
I'm very new to this model and read most of the articles related to docker, but this leads me few of these doubts, hence asking for these clarifications.
Thanks,
Karthik.
The web app and the DB server are 2 different images (aka 2 different servers). For example how we use Postgres in testing see:
https://github.com/kiwitcms/Kiwi/blob/master/docker-compose.postgres

Migrating from Standalone Apache SOLR to SOLR Cloud

We have a standalone Apache Solr setup in local Server for testing purpose, with around 10 cores created in it. Now we are planning to setup a new Solrcloud environment in AWS with numShards:3 and replicationFact:3 .
Is there any way to transfer existing Apache Solr core (Schema and data) to new SolrCloud environment in AWS.
Used ZooKeeper to Manage Configuration Files. Uploaded Configuration file to Zookeeper : /bin/solr zk upconfig -n -d

IBM Cloud Private-Community Edition - Waiting for cloudant database initialization

I tried below command
docker run --rm -t -e LICENSE=accept --net=host -v "$(pwd)":/installer/cluster ibmcom/icp-inception:2.1.0 install
the response is
Waiting for cloudant initialization
I entered the command received the logs shown in the image. No error shown. Please give a solution
From the error message, for cloudant database initialization issue, it may be caused by the cloudant docker image is pulled from dockerhub while ICP installation. The cloudant docker image is big, you can run below command to check whether the image is ready in your environment.
$ docker images | grep icp-datastore
If the cloudant docker image is ready in your environment, and the ICP installation still has cloudant database initialization issue, you can try to install the latest ICP 2.1.0.3 Community Edition. From 2.1.0.3, ICP removes the cloudant database. The ICP 2.1.0.3 installation documentation:
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0.3/installing/install_containers_CE.html
If you still want to check the cloudant database initialization issue in ICP 2.1.0.1 environment, you can:
Ensure your ICP nodes match the system and hardware requirements firstly.
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0/supported_system_config/system_reqs.html
Let us know the ICP installation configurations. You can check the contents for config.yaml and hosts files.
Check the system logs (in /var/log/messages or /var/log/syslog file) to find the relevant errors.
Run 'docker logs ' command to check the logs or errors.

SOLR on Elastic Beanstalk

I would like to run a SOLR Server on an Elastic Beanstalk. But I cannot find that much about that in the web.
It must be possible somehow, 'cause some are using it already. (https://forums.aws.amazon.com/thread.jspa?threadID=91276 i.e.)
Any Ideas how I could do that?
Well, somehow I can upload the solr warfile into the environment, but then it gets complicated.
Where do I put the config files and the index directory, so that each instance can reach it?
EDIT: Please keep in mind that this answer is from 2013. The products mentioned here have likely evolved. I have updated the documentation link to reflect changes in the solr clustering wiki. I encourage you to continue your research after reading this information.
ORIGINAL:
It only really makes sense to run solr on beanstalk instances if you are planning to only ever use the single server deploy. The minute that you want to scale your app you will need to configure your beanstalk environment to either create a solr cluster or move to something like CloudSearch. If you are unfamiliar with ec2 lifecycles and solr deployments then CloudSearch will almost certainly save you time (read money).
If you do want to run solr on a single instance then you can use rake to launch it by adding a file to your local repo named .ebextensions/solr.config with the following contents:
container_commands:
01create_post_dir:
command: "mkdir -p /opt/elasticbeanstalk/hooks/appdeploy/post"
ignoreErrors: true
02killjava:
command: "killall java"
test: "ps uax | grep java | grep root"
ignoreErrors: true
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/99_start_solr.sh":
mode: "755"
owner: "root"
group: "root"
content: |
#!/usr/bin/env bash
. /opt/elasticbeanstalk/support/envvars
cd $EB_CONFIG_APP_CURRENT
su -c "RAILS_ENV=production bundle exec rake sunspot:solr:start" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake db:seed" $EB_CONFIG_APP_USER
su -c "RAILS_ENV=production bundle exec rake sunspot:reindex" $EB_CONFIG_APP_USER
Please keep in mind that this will cause chaos if you are using autoscaling.

Resources