I am using Sqoop to import data from SQL server to local HDFS. I am using a simple free form query to pull some 10 rows from the table. Below is the sqoop command that I execute from the terminal:
sqoop import --connect 'jdbc:sqlserver://xx.xx.xx.xx;username=xx;password=xxxxx;database=DBName' --query "SELECT top 10 OrderID from DJShopcart_OrderItems where \$CONDITIONS" --split-by "OrderID" --target-dir /work/gearpurchase
When I execute this from my local machine, I get the following exception:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The index
2 is out of range. at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:191)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.verifyValidColumnIndex(SQLServerResultSet.java:543)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getterGetColumn(SQLServerResultSet.java:2066)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:2099)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:2084)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getInt(SQLServerResultSet.java:2327)
at
org.apache.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:52)
at
com.cloudera.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:53)
at QueryResult.readFields(QueryResult.java:105) at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:244)
If I import 2 columns, the exception says index 3 is out of range.
I checked the SQLServerResultSet class documentation also, to understand what could be the cause of exception but to no use. Only got more confused with concepts like Client-Side Cursor vs Server-Side Cursor
No matter what I try I can't get this simple free form query to import data from SQL server.
Sqoop version : 1.4.6
Hadoop : 2.7.3
Machine : Ubuntu 16.04
Please help me out. Thanks in advance.
If I import 2 columns, the exception says index 3 is out of range.
Then the fault lies with SQOOP.
at org.apache.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:52)
As it's passing 3 as an argument to
at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getInt(SQLServerResultSet.java:2327)
When the result only contains 2 columns, and the valid column indexes are 1 and 2.
Related
As a Part of my little App I try to map Data between my Ontology and an Oracle DB with ontop. But my first mapping is not accepted by the reasoner and it's not clear why.
As my first attempt I use the following target:
:KIS/P_PVPAT_PATIENT/{PPVPAT_PATNR} a :Patient .
and the following source:
select * from P_PVPAT_PATIENT
Here KIS is the schema, p_pvpat_patient the table and ppvpat_patnr the key.
Caused by: it.unibz.inf.ontop.exception.InvalidMappingSourceQueriesException:
Error: Relation IDs mismatch: P_PVPAT_PATIENT v "KIS"."P_PVPAT_PATIENT" P_PVPAT_PATIENT
Problem location: source query of triplesMap
[id: MAP_PATIENT
target atoms: triple(s,p,o) with
s/RDF(http://www.semanticweb.org/grossmann/ontologies/kis-ontology#KIS/P_PVPAT_PATIENT/{}(TmpToVARCHAR2(PPVPAT_PATNR)),IRI), p/<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, o/<http://www.semanticweb.org/grossmann/ontologies/kis-ontology#Patient>
source query: select * from P_PVPAT_PATIENT]
As the error said my source query was wrong because I forgot to use the schema in my sql.
the correct sql is
select * from kis.P_PVPAT_PATIENT
I'm using flask-sqlalchemy to update mssql record, but it returns -1.
Library versions:
SQLAlchemy 1.3.11
Flask-SQLAlchemy 2.4.1
pyodbc 4.0.27
flask 1.1.1
Code part 1:
ret = db.session.query(XXX).filter_by(id=1).update({"xxx": "xxxx"})
print("ret", ret)
db.session.commit()
The ret is -1, but the record has been modified .
Code part 2:
obj = XXX.query.filter_by(id=q).first()
obj .xx = "xxx"
db.session.commit()
Raise error:
sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'XXX' expected to update 1 row(s); -1 were matched.
And the modify did not successed.
According to SQLAlchemy documentation, there's currently a limitation with some versions of SQL Server drivers not returning the number of records for UPDATE and DELETE statements. I'm currently faced with the issue on Linux, but it's working fine on Windows.
Here's also a related SQL Alchemy issue
I used the column as a version indicator (the documentation recommended creating a SQLServer ROWVERSION however, SQLAlchemy/PyODBC, again on Linux, was not able to assign proper FetchedValues() from the database as bytes into the fields. I also tried using a DateTime2 field - however again for SQLAlchemy accuracy when mapping the field to Python (7 precision)
I ended up implementing the following change:
Since I had the ID column already assigned by the database (IDENTITY), I used that field as the version indicator.
__mapper_args__ = {
'version_id_col': id_column,
'version_id_generator': False,
}
The SQLAlchemy update statements now looks like:
UPDATE <TABLE> SET <column>=? OUTPUT inserted.<ID-COLUMN> WHERE <TABLE>.<ID-COLUMNS> <ID-COLUMN> = ? AND <TABLE>.<ID-COLUMN> = ?
[('updated data', 123456, 123456)]
Maybe it is a case of me looking at this for too long. But I have this Oracle update query I am trying to run, I have verified the query works with hardcoded values on SQL Developer, howver when I run it from my flow Mule it fails.can anybody tell me what I am doing wrong?
Here is the query:
<db:update config-ref="DBConf" doc:name="abcd">
<db:dynamic-query><![CDATA[UPDATE myTable
SET TYPE= 'Entry',
ENTERED_DATE=SYSDATE,
ENTRY_BY= 2345,
ENTRY_DATE=TO_DATE('#[flowVars.entryDate]','YYYY-MM-DD')
WHERE ID = 'abcd1234']]>
</db:dynamic-query>
</db:update>
the flowVars.entryDate value is '2017-05-10'
This throws the following Error:
Message : ORA-01841: (full) year must be between -4713 and +9999, and not be 0
(java.sql.SQLDataException). Message payload is of type: Integer
Now the same query works like I said in SQL Developer but not in Mule, Can anybody provide any input
You can find the same problem answer in the following link:
Oracle: year must be between -4713 and +9999, and not be 0
Try this once. TO_DATE('2012-05-12','yyyy-mm-dd').
Delete the quotes for the #[flowvar.entrydate]
i can't import data to cassandra because i am using DSE Solr now and as i can see it created solr_query (virtual column) in my table.
So i tried COPY table FROM 'file' WITH SKIPCOLS = "solr_query";
but getting same error.
Failed to import 10 rows: ParseError - Invalid row length 9 should be 10 - given up without retries.
So how can i import data and ignore solr_query column ?
The copy command accepts the columns to import as a list COPY. Try to list them, avoiding the solr_query column, and it should be ok:
COPY table (colA, colB, colC,...) FROM 'file'
I have created a table :
add jar /../xlibs/hive-json-serde-0.2.jar;
CREATE EXTERNAL TABLE SerdeTest
(Unique_ID STRING
,MemberID STRING
,Data ARRAY>
)
PARTITIONED BY (Pyear INT, Pmonth INT)
ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde";
ALTER TABLE SerdeTest ADD
PARTITION (Pyear = 2014, Pmonth =03) LOCATION '../Test2';
The data in the file :
{"Unique_ID":"ABC6800650654751","MemberID":"KHH966375835","Data":[{"SerialNo":1,"VariableName":"Var1","VariableValue":"A_49"},{"SerialNo":2,"VariableName":"Var2","VariableValue":"B_89"},{""SerialNo":3,"VariableName":"Var3","VariableValue":"A_99"}]}
Select query that I am using:
select Data[0].SerialNo from SerdeTest where Unique_ID = 'ABC6800650654751';
however, when I run this query I get the following error:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: org.json.JSONArray cannot be cast to [Ljava.lang.Object;
at org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:98)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:330)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:386)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:237)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:223)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
Can anyone please suggest me what am I doing wrong
Few suggestions:
Make sure that all the packages of hive and hive-json-serde-0.2.jar have execute permission for hadoop user.
Hive creates a file called derby.log and metastore_db in the hive directory. It should be allowed to the user invoking the hive query to create files and directories.
Location for data should have / at the end. e.g. LOCATION '../Test2/';
In short, the working JAR is json-serde-1.3-jar-with-dependencies.jar which can be found here. This one is working with 'STRUCT' and can even ignore some malformed JSON. During the creation of the table, include the following code:
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
LOCATION ...
If needed, it is possible to recompile it from here or here. I tried the first repository and it is compiling fine for me, after adding the necessary libs. The repository has also been updated recently.
Check for more details here.