Hive Serde errors with Array<Struct<>> org.json.JSONArray cannot be cast to [Ljava.lang.Object; - arrays

I have created a table :
add jar /../xlibs/hive-json-serde-0.2.jar;
CREATE EXTERNAL TABLE SerdeTest
(Unique_ID STRING
,MemberID STRING
,Data ARRAY>
)
PARTITIONED BY (Pyear INT, Pmonth INT)
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde";
ALTER TABLE SerdeTest ADD
PARTITION (Pyear = 2014, Pmonth =03) LOCATION '../Test2';
The data in the file :
{"Unique_ID":"ABC6800650654751","MemberID":"KHH966375835","Data":[{"SerialNo":1,"VariableName":"Var1","VariableValue":"A_49"},{"SerialNo":2,"VariableName":"Var2","VariableValue":"B_89"},{""SerialNo":3,"VariableName":"Var3","VariableValue":"A_99"}]}
Select query that I am using:
select Data[0].SerialNo from SerdeTest where Unique_ID = 'ABC6800650654751';
however, when I run this query I get the following error:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:330)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:386)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:237)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:223)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
Can anyone please suggest me what am I doing wrong

Few suggestions:
Make sure that all the packages of hive and hive-json-serde-0.2.jar have execute permission for hadoop user.
Hive creates a file called derby.log and metastore_db in the hive directory. It should be allowed to the user invoking the hive query to create files and directories.
Location for data should have / at the end. e.g. LOCATION '../Test2/';

In short, the working JAR is json-serde-1.3-jar-with-dependencies.jar which can be found here. This one is working with 'STRUCT' and can even ignore some malformed JSON. During the creation of the table, include the following code:
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
LOCATION ...
If needed, it is possible to recompile it from here or here. I tried the first repository and it is compiling fine for me, after adding the necessary libs. The repository has also been updated recently.
Check for more details here.

Related

Relation IDs mismatch - Mapping OWL to Oracle DB with Ontop

As a Part of my little App I try to map Data between my Ontology and an Oracle DB with ontop. But my first mapping is not accepted by the reasoner and it's not clear why.
As my first attempt I use the following target:
:KIS/P_PVPAT_PATIENT/{PPVPAT_PATNR} a :Patient .
and the following source:
select * from P_PVPAT_PATIENT
Here KIS is the schema, p_pvpat_patient the table and ppvpat_patnr the key.
Caused by: it.unibz.inf.ontop.exception.InvalidMappingSourceQueriesException:
Error: Relation IDs mismatch: P_PVPAT_PATIENT v "KIS"."P_PVPAT_PATIENT" P_PVPAT_PATIENT
Problem location: source query of triplesMap
[id: MAP_PATIENT
target atoms: triple(s,p,o) with
s/RDF(http://www.semanticweb.org/grossmann/ontologies/kis-ontology#KIS/P_PVPAT_PATIENT/{}(TmpToVARCHAR2(PPVPAT_PATNR)),IRI), p/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, o/<http://www.semanticweb.org/grossmann/ontologies/kis-ontology#Patient>
source query: select * from P_PVPAT_PATIENT]
As the error said my source query was wrong because I forgot to use the schema in my sql.
the correct sql is
select * from kis.P_PVPAT_PATIENT

scala udf complains"java.lang.ClassNotFoundException"in flink sql client

The whole scala project for UDF is here:
Flink_SQL_Client_UDF/Scala_fixed/
My operation to register the udf is like this:
①mvn scala:compile package
②cp table_api-1.0-SNAPSHOT.jar $FLINK_HOME/lib
③add the following sentence into $FLINK_HOME/conf/flink-conf.yaml
flink.execution.jars: $FLINK_HOME/lib/table_api-1.0-SNAPSHOT.jar
④create temporary function scalaupper as 'ScalaUpper';
⑤CREATE TABLE orders (
order_uid BIGINT,
product_name String,
price DECIMAL(32, 2),
order_time TIMESTAMP(3)
) WITH (
'connector' = 'datagen'
);
⑥select scalaupper(product_name) from orders;
Then I got
java.lang.ClassNotFoundException: ScalaUpper
Need your help, thanks!
#needhelp. Thanks for your detailed steps to reproduce the probelm. I think we can solve this problem by using -j command in the sql client[1] to add jar into the java class path. In my local environemnt, it works. But I don't find any information about the 'flink.execution.jars' in the document[2]. Therefore, I am not sure whether this option works for the sql client.
When registering the function into the table environment, the function catalog just does a simple validation and add the <identifier, path> into a map. It doesn't load the class into the runtime. Only when the job invokes the function, the function catalog load the class into the runtime.
[1]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#configuration
[2]https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html

Import JSON into ClickHouse

I create table with this statement:
CREATE TABLE event(
date Date,
src UInt8,
channel UInt8,
deviceTypeId UInt8,
projectId UInt64,
shows UInt32,
clicks UInt32,
spent Float64
) ENGINE = MergeTree(date, (date, src, channel, projectId), 8192);
Raw data looks like:
{ "date":"2016-03-07T10:00:00+0300","src":2,"channel":18,"deviceTypeId ":101, "projectId":2363610,"shows":1232,"clicks":7,"spent":34.72,"location":"Unknown", ...}
...
Files with data loaded with the following command:
cat *.data|sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]+0300//'| clickhouse-client --query="INSERT INTO event FORMAT JSONEachRow"
clickhouse-client throw exception:
Code: 117. DB::Exception: Unknown field found while parsing JSONEachRow format: location: (at row 1)
Is it possible to skip fields from JSON object that not presented in table description?
The latest ClickHouse release (v1.1.54023) supports input_format_skip_unknown_fields user option which eneables skipping of unknown fields for JSONEachRow and TSKV formats.
Try
clickhouse-client -n --query="SET input_format_skip_unknown_fields=1; INSERT INTO event FORMAT JSONEachRow;"
See more details in documentation.
Currently, it is not possible to skip unknown fields.
You may create temporary table with additional field, INSERT data into it, and then do INSERT SELECT into final table. Temporary table may have Log engine and INSERT into that "staging" table will work faster than into final MergeTree table.
It is relatively easy to add possibility to skip unknown fields into code (something like setting 'format_skip_unknown_fields').

Do or don't I need to create a table that I want to prepopulate with H2?

I'm confused by the errors I get when trying to create an in-memory H2 DB for my Spring Boot application. The relevant configuration is
db.url=jdbc:h2:mem:test;MODE=MySQL;DB_CLOSE_DELAY=-1;INIT=runscript from 'classpath:create.sql'
hibernate.hbm2ddl.auto=create
And create.sql:
CREATE TABLE `cities` (
`name` varchar(45) NOT NULL,
PRIMARY KEY (`name`)
) ;
INSERT INTO `cities` VALUES ('JAEN'),('ALBACETE');
But I get the error Caused by: org.h2.jdbc.JdbcSQLException: Table "CITIES" already exists;
Weird is, if I remove the CREATE TABLE statement, I get:
Caused by: org.h2.jdbc.JdbcSQLException: Table "CITIES" not found;
The only thing that works is using DROP TABLE IF EXISTS, but well, I don't think I should need to.
What's going on? What's the proper way of pre-populating static data into an H2 memory DB?
1) Hibernate way: use import.sql file or specify files
spring.jpa.properties.hibernate.hbm2ddl.import_files=file1.sql,file2.sql
http://docs.spring.io/spring-boot/docs/current/reference/html/howto-database-initialization.html
2) Spring Boot: use default schema.sql & data.sql files
or specify files through properties
spring.datasource.schema = file1.sql
spring.datasource.data = file1.sql, file2.sql
http://docs.spring.io/autorepo/docs/spring-boot/1.0.2.RELEASE/reference/html/howto-database-initialization.html

'do_replace()' not working?

while trying ATK4 I've found a problem:
$this->api->db->dsql()->table('person')->set('id', 1)->set('name', 'Test user')->do_replace();
This is not working. Then I looked a little bit deeper in ATK4 source and found in /opt/ipism/www/atk4/lib/DB/dsql.php the lines
public $sql_templates=array(
'select'=>"select [options] [field] [from] [table] [join] [where] [group] [having] [order] [limit]",
'insert'=>"insert [options_insert] into [table_noalias] ([set_fields]) values ([set_values])",
'replace'=>"replace [options_replace] into [table_noalias] ([set_fields]) values ([set_values])",
'update'=>"update [table_noalias] set [set] [where]",
'delete'=>"delete from [table_noalias] [where]",
'truncate'=>'truncate table [table_noalias]',
'describe'=>'desc [table_noalias]',
);
After changing the 'replace'-line into
'replace'=>"replace into [table_noalias] ([set_fields]) values ([set_values])",
it worked for me (removing the options_replace and appending a 's' to set_value). I'm using latest version from git with a MySQL database connection.
But I'm not sure, if I'm using 'do-replace()' in the wrong way?
ByE...
By the way: Is there a way to send fixes, without creating an account on GitHub or somewhere?
Edit: Here is the output if the options_replace isn't removed from the template:
replace [options_replace] into `person` (`id`,`name`) values ("1","John Doe") [:a_2, :a]Application Error: Database Query Failed
Exception_DB, code: 0Additional information: pdo_error: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[options_replace] into `person` (`id`,`name`) values ('1' at line 1 mode: replace params: :a: 1 :a_2: John Doe query: replace [options_replace] into `person` (`id`,`name`) values (:a,:a_2) template: replace [options_replace] into [table_noalias] ([set_fields]) values ([set_values])/opt/ipism/www/atk4/lib/DB/dsql.php:1519
Stack trace:
File Object NameStack Trace/opt/ipism/www/atk4/lib/BaseException.php:63 Exception_DBException_DB->collectBasicData(Null)
/opt/ipism/www/atk4/lib/AbstractObject.php:545 Exception_DBException_DB->__construct("Database Query Failed", Null)
/opt/ipism/www/atk4/lib/DB/dsql.php:1519 sample_project_db_db_dsql_mysqlDB_dsql_mysql->exception("Database Query Failed")
/opt/ipism/www/atk4/lib/DB/dsql.php:1586 sample_project_db_db_dsql_mysqlDB_dsql_mysql->execute()
/opt/ipism/www/atk4/lib/DB/dsql.php:1624 sample_project_db_db_dsql_mysqlDB_dsql_mysql->replace()
/opt/ipism/www/page/test.php:40 sample_project_db_db_dsql_mysqlDB_dsql_mysql->do_replace()
/opt/ipism/www/atk4/lib/AbstractObject.php:306 sample_project_testpage_test->init()
/opt/ipism/www/atk4/lib/ApiFrontend.php:130 sample_projectFrontend->add("page_test", "test", "Content")
/opt/ipism/www/atk4/lib/ApiWeb.php:428 sample_projectFrontend->layout_Content()
/opt/ipism/www/atk4/lib/ApiFrontend.php:39 sample_projectFrontend->addLayout("Content")
/opt/ipism/www/atk4/lib/ApiWeb.php:275 sample_projectFrontend->initLayout()
/opt/ipism/www/index.php:15 sample_projectFrontend->main()
Note: To hide this information from your users, add $config['logger']['web_output']=false to your config.php file. Refer to documentation on 'Logger' for alternative logging options
Replace is similar to "insert" by it's nature, but instead of failing when primary key is duplicated, it replaces the value.
Please add ->debug() to your line before do_replace and give me the output, which would help me understand why that parameter needs removing.
set_value seems to be a typo, I have changed and committed it into master: https://github.com/atk4/atk4/commit/24b20865b9e3345a8e7504dfb68b7ef96335009e
the best way to submit changes is by creating a pull request. The best way to report issues is through "issues" in github currently.

Resources