Clurdera cannot find class 'org.apache.hadoop.hive.solr.SolrStorageHandler' - solr

I'm using SolrStorageHandler to insert data into Solr from Hive tables.
I'm following this tutorial but when lunching oozie workflow i get an error
"Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.hadoop.hive.solr.SolrStorageHandler' (state=42000,code=40000)".
I cannot find SolrStorageHandler.jar on the web to add it manually to hive script. I'm doing something wrong or is this approach deprecated? If it's depracted how can I insert data to Solr from Hive in efficient way?
btw. Cloudera version is 5.4.3

Related

SSIS to Oracle "Could not create a managed connection manager."

I'm trying to use SSIS to load some data from Oracle database to MSSQL database.
I created the project and used the ADO.Net source and was able to create a connection to Oracle and run queries and view results.
However when I actually run the package I get the following error:
Error: 0xC0208449 at Data Flow Task, ADO NET Source 2: ADO NET Source has failed to acquire the connection {EECB236A-59EA-475E-AE82-52871D15952D} with the following error message: "Could not create a managed connection manager.".
It seems similar to the issue here
And I did find that I have two oracle clients version installed "11.1" and "12.2".
One is used by PL/SQL and the other by other entity framework project.
If this is the issue I just wanted a way to tell the SSIS to pick-up the correct one.
I tried adding Entry in machine.config for "oracle.manageddataaccess.client" section with the desired version.
I also tried using other types of data sources but couldn't even create a successful connection
I tried changing the Run64bitRuntime property in the project to False
Note: I don't have SSIS installed on my machine.
Eventually, I just had to remove the entries related to 11.1 in path variable then restarted my machine.
Also I switched to "dotConnectForOracle" for connection and now it seems to be working fine.
I'm expecting issues related to other applications that might still be using the 11.1 version, but that will be a problem for another day.
Always make sure to write the user (oracle schema) in uppercase and some special characters [in my case it was $] in the password needs escape character even if you're using the wizard not the cmd
I still don't understand the whole issue but I hope this helps someone some day.

How to integrate Zeppelin Solr

After create Solr interpreter then trying to query collection through zookeeper, however it is throwing Exception Caused by: org.noggit.JSONParser$ParseException:
are you trying to look at this directly? you can go via the jdbc service it wasn't too complicated
https://lucene.apache.org/solr/guide/7_7/solr-jdbc-apache-zeppelin.html

Run Cassandra query using Solr without using Datstax

I currently set up a Solr indexing schema with one of my databases in Cassandra. I am able to run queries using the Solr Admin UI when visiting my local IP address with the appropriate port. However, I am unable to run any queries when trying CQL Solr Queries. Running a line such as SELECT * FROM keyspace.table WHERE solr_query='name: cat name: dog -name:fish' results in a "unknown column solr_query" error. I also tried adding the line <requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query" /> to my solrconfig.xml as was mentioned in https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchCql.html but got a "Could not load class" error. Any idea on what I'm doing wrong?
Thanks!

Error while re-loading the Solr core after schema.xml is modified. could not achieve replication factor 1 (found 0 replicas only)

Currently working with Cassandra in Solr mode and started running Cassandra in Solr.
using dse 4.7
cassandra 2.1.8
./dsetool create_core vin_service_development.vinid_search1
generateResources=true reindex=true
Created indexes successfully and able to see the table under Core Selector Select list in http://10.14.210.22:8983/solr/#/
Changed the schema.xml field type from "TextField" to "StrField" and want to reload the changes made to schema.xml file.
After executing the below command.
./dsetool reload_core vin_service_development.vinid_search1 reindex=true solrconfig=solr.xml
solr.xml is placed in the same path of dsetool.
Error Info:
brsblcdb012:/apps/apg-data.cassandra/bin ./dsetool reload_core vin_service_development.vinid_search1 reindex=true solrconfig=solr.xml
WARN 20:21:14 Error while computing token map for datacenter datacenter1: could not achieve replication factor 1 (found 0 replicas only), check your keyspace replication settings. Note that this can affect the performance of the driver.
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error in xpath:/config/luceneMatchVersion for solrconfig.xml
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:665)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:303)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:294)
at com.datastax.bdp.tools.SearchDseToolCommands.createOrReloadCore(SearchDseToolCommands.java:383)
at com.datastax.bdp.tools.SearchDseToolCommands.access$200(SearchDseToolCommands.java:53)
at com.datastax.bdp.tools.SearchDseToolCommands$ReloadCore.execute(SearchDseToolCommands.java:201)
at com.datastax.bdp.tools.DseTool.run(DseTool.java:114)
at com.datastax.bdp.tools.DseTool.run(DseTool.java:51)
at com.datastax.bdp.tools.DseTool.main(DseTool.java:174)
Is this the correct way to re-load the core in Solr after making changes to the xml files?
Updated:
One of my keyspace was using NetworkTopologyStrategy earlier. Fixed this to SimpleStrategy. Now all the keyspaces have SimpleStrategy in the datacenter Solr.
After executing the same command, got this error.
brsblcdb012:/apps/apg-data.cassandra/bin ./dsetool reload_core vin_service_development.vinid_search1 reindex=true solrconfig=solr.xml
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error in xpath:/config/luceneMatchVersion for solrconfig.xml
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:665)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:303)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:294)
at com.datastax.bdp.tools.SearchDseToolCommands.createOrReloadCore(SearchDseToolCommands.java:383)
at com.datastax.bdp.tools.SearchDseToolCommands.access$200(SearchDseToolCommands.java:53)
at com.datastax.bdp.tools.SearchDseToolCommands$ReloadCore.execute(SearchDseToolCommands.java:201)
at com.datastax.bdp.tools.DseTool.run(DseTool.java:114)
at com.datastax.bdp.tools.DseTool.run(DseTool.java:51)
at com.datastax.bdp.tools.DseTool.main(DseTool.java:174)
what would be the recommended change now?
To sum up the conversation:
The keyspace replication configuration was initially wrong (updated to SimpleStrategy RF2):
Your nodes are now in Datacenter 'Solr' but one of your keyspaces is configured with NetworkTopologyStrategy and a replication factor referencing 'datacenter1'.
You had accidentally replaced your solrconfig with the wrong XML which caused this error. To fix this you can recreate your solr core.
In DSE 4.8 you can remove your solr core using unload_core and recreate it. If on an older verison of DSE you can follow 'Remove core from Datastax Solr'.

spark dataframe not appending to the table

I am unable to append to a table when writing to a table using dataframe write
Here is the command I am using
df1.write.mode("append").jdbc("jdbc:jtds:sqlserver://noi-nipuna-w81/sams","sams", props)
This is the exception I get
java.sql.SQLException: There is already an object named 'sams' in the database.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
Where am I making a mistake
I am using spark 1.4 version
Urug and Hafiz have already answered. Below code did the job for me.
sqlDF
.write()
.mode(SaveMode.Append)
.jdbc(jdbcDBUrl, "DBName.SchemaName.TableName", connProperties);
I had the same problem with MS SQL Server. I found that updating Spark to 1.6.0 fixed the problem (link to the other SO answer). The reason was a buggy query for checking if the table exists.

Resources