Error when copying data into Variant table from AVRO file - snowflake-cloud-data-platform

I am completing a snowflake university workshop but I have run into a problem. The course has provided an AVRO file and asked us to insert the data into a Variant column table. However when I run the COPY INTO commamd I get this error:
Number of columns in file (11) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File 'iot_files/iot_files_sample_output.avro', line 1, character 827 Row 1, column "IOT_AVRO_DATA"[11] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.
These are the instructions given by the course:
CREATE OR REPLACE TABLE IOT_AVRO_DATA
(mycolumn VARIANT);
copy INTO IOT_AVRO_DATA
FROM #GOOGLE_BUCKET_SFHOL/iot_files/iot_files_sample_output.avro;
FILE_FORMAT = (type = AVRO);
It looks like there is a mismatch between the number of columns in the file and in the table.
Any help advice would be appreciated, tried reaching out to snowflake via the workshop but they have not responded.

Are you sure your AVRO file is not corrupted?
The following works fine for me:
Upload to my stage a sample avro file (userdata1.avro taken from here)
spanaite#(no warehouse)#SERGIU_DB.(no schema)>put file:///Users/spanaite/Downloads/userdata1.avro #~;
+----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| userdata1.avro | userdata1.avro.gz | 93561 | 79248 | NONE | GZIP | UPLOADED | |
+----------------+-------------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 3.026s
spanaite#(no warehouse)#SERGIU_DB.(no schema)>
Create a table and load the avro file:
create or replace table test_avro(mycolumn VARIANT);
copy into test_avro from #~/userdata1.avro.gz file_format = (type = AVRO);
select * from test_avro;
Try with one of the sample files from the link I posted above.

Related

How to check if a request located in JDBC_SESSION_INIT_STATEMENT is working? DataframeReader

I am trying to connect to sql server with spark-jdbc, using JDBC_SESSION_INIT_STATEMENT to create a temporary table and then download data from the temporary table in the main query.
I have the following code:
//df is org.apache.spark.sql.DataFrameReader
val s = """select * into #tmp_table from ( SELECT op.ID,
| op.Date,
| op.DocumentID,
| op.Amount,
| op.AmountCurr,
| op.CurrencyID,
| operson.ObjectTypeId AS PersonOT,
| op.PersonID,
| ocontract.ObjectTypeId AS ContractOT,
| op.ContractID,
| op.DocNum,
| op.MomentCreate,
| op.ObjectTypeID,
| op.OwnerObjectID
|FROM dbo.Operation op With (Index = IX_Operation_Date) --Без хинта временами уходит в скан всей таблицы
|LEFT JOIN dbo.Object ocontract ON op.ContractID = ocontract.ID
|LEFT JOIN dbo.Object operson ON op.PersonID = operson.ID
|WHERE op.Date>='2019-01-01' and op.Date<'2020-01-01' AND 1=1
|) wrap_for_single_connect
|OPTION (LOOP JOIN, FORCE ORDER, MAX_GRANT_PERCENT=25)""".stripMargin
df
.option(JDBCOptions.JDBC_SESSION_INIT_STATEMENT, s)
.jdbc(
jdbcUrl,
"(select * from tempdb.#tmp_table) sub",
connectionProps)
i get com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '#tmp_table'.
And I have a feeling that JDBC_SESSION_INIT_STATEMENT is not working, because I deliberately tried to mess up the request and still got the Invalid object error.
How can I check if the request is working in JDBC_SESSION_INIT_STATEMENT?
One way to know whether your JDBCOptions.JDBC_SESSION_INIT_STATEMENT is executed is to enable INFO logging level for org.apache.spark.sql.execution.datasources.jdbc logger.
That should trigger this line and print out the following message to the logs:
Executing sessionInitStatement: [sql]
Given the comment I don't think you should use it to create a source table to load records from:
// This executes a generic SQL statement (or PL/SQL block) before reading
// the table/query via JDBC. Use this feature to initialize the database
// session environment, e.g. for optimizations and/or troubleshooting.
You should use dbtable or query parameter instead.

403 error running data unload with snowsql GET

I'm having issues testing a data unload flow from Snowflake using the GET command to store the files on my local machine.
Following the documentation here, it should be as simple as creating a stage, copying the data I want to that stage, and then running a snowsql command locally to retrieve the files.
I'm on Windows 10, running the following snowsql command to try and unload the data, against a database populated with the test TCP-H data that snowflake provides:
snowsql -a <account id> -u <username> -q "
USE DATABASE TESTDB;
CREATE OR REPLACE STAGE TESTSNOWFLAKESTAGE;
copy into #TESTSNOWFLAKESTAGE/supplier from SUPPLIER;
GET #TESTSNOWFLAKESTAGE file://C:/Users/<local user>/Downloads/unload;"
All commands run successfully, except for the final GET:
SnowSQL * v1.2.14
Type SQL statements or !help
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.121s
+-------------------------------------------------+
| status |
|-------------------------------------------------|
| Stage area TESTSNOWFLAKESTAGE successfully created. |
+-------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.293s
+---------------+-------------+--------------+
| rows_unloaded | input_bytes | output_bytes |
|---------------+-------------+--------------|
| 100000 | 14137839 | 5636225 |
+---------------+-------------+--------------+
1 Row(s) produced. Time Elapsed: 7.548s
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
| file | size | status | message |
|-----------------------+------+--------+------------------------------------------------------------------------------------------------------|
| supplier_0_0_0.csv.gz | -1 | ERROR | An error occurred (403) when calling the HeadObject operation: Forbidden, file=supplier_0_0_0.csv.gz |
+-----------------------+------+--------+------------------------------------------------------------------------------------------------------+
1 Row(s) produced. Time Elapsed: 1.434s
This 403 looks like it's coming from the S3 instance backing my Snowflake account, but that's part of the abstracted service layer provided by Snowflake, so I'm not sure where I would have to go to flip auth switches.
Any guidance is much appreciated.
You need to use Windows-based slashes in your local file path. So, assuming that to #NickW's point, you are filling your local user correctly, the format should be like the following:
file://C:\Users\<local user>\Downloads
There are some examples in the documentation for this here:
https://docs.snowflake.com/en/sql-reference/sql/get.html#required-parameters

SSRS System.InvalidCastException - at OracleDataReader.GetDecimal(Int32 i)

I have an SSRS report that was pointed to SQL Server views, which pointed to Oracle tables. I edited the SSRS report Dataset so as to query directly from the Oracle db. It seems like a very simple change until I got this error message:
System.InvalidCastException: Specified cast is not valid.
With the following details...
Field ‘UOM_QTY’ and it also says at
Oracle.ManagedDataAccess.Client.OracleDataReader.GetDecimal(Int32 i).
The SELECT statement on that field is pretty simple:
, (DELV_RECEIPT.INV_LBS/ITEM_UOM_XREF.CONV_TO_LBS) AS UOM_QTY
Does anyone know what would cause the message, and how to resolve the error? My objective is use to use the ORACLE datasource instead of SQL SERVER.
Error 1
Severity Code Description Project File Line Suppression State
Warning [rsErrorReadingDataSetField] The dataset ‘dsIngredientCosts’ contains a definition for the Field ‘UOM_QTY’. The data extension returned an error during reading the field. System.InvalidCastException: Specified cast is not valid.
at Oracle.ManagedDataAccess.Client.OracleDataReader.GetDecimal(Int32 i)
at Oracle.ManagedDataAccess.Client.OracleDataReader.GetValue(Int32 i)
at Microsoft.ReportingServices.DataExtensions.DataReaderWrapper.GetValue(Int32 fieldIndex)
at Microsoft.ReportingServices.DataExtensions.MappingDataReader.GetFieldValue(Int32 aliasIndex) C:\Users\bl0040\Documents\Visual Studio 2015\Projects\SSRS\Project_ssrs2016\Subscription Reports\Feed Ingredient Weekly Price Avg.rdl 0
Error 2
Severity Code Description Project File Line Suppression State
Warning [rsMissingFieldInDataSet] The dataset ‘dsIngredientCosts’ contains a definition for the Field ‘UOM_QTY’. This field is missing from the returned result set from the data source. C:\Users\bl0040\Documents\Visual Studio 2015\Projects\SSRS\Project_ssrs2016\Subscription Reports\Feed Ingredient Weekly Price Avg.rdl 0
Source Tables:
+------------+---------------+-------------+---------------+-----------+
| Source | TABLE_NAME | COLUMN_NAME | DataSize | COLUMN_ID |
+------------+---------------+-------------+---------------+-----------+
| ORACLE | DELV_RECEIPT | INV_LBS | NUMBER (7,0) | 66 |
+------------+---------------+-------------+---------------+-----------+
| ORACLE | ITEM_UOM_XREF | CONV_TO_LBS | NUMBER (9,4) | 3 |
+------------+---------------+-------------+---------------+-----------+
| SQL SERVER | DELV_RECEIPT | INV_LBS | numeric (7,0) | 66 |
+------------+---------------+-------------+---------------+-----------+
| SQL SERVER | ITEM_UOM_XREF | CONV_TO_LBS | numeric (9,4) | 3 |
+------------+---------------+-------------+---------------+-----------+
The error went away after adding a datatype conversion statement to the data selection.
, CAST(DELV_RECEIPT.INV_LBS/ITEM_UOM_XREF.CONV_TO_LBS AS NUMERIC(9,4)) AS UOM_QTY
Can anyone provide some information on why the original query would be a problem and why the CAST would fix these errors? I tried casting the results because someone on Code Project forum said...
why don't you use typed datasets? you get such head aches just because
of not coding in a type-safe manner. you have a dataset designer in
the IDE which makes the life better, safer, easier and you don't use
it. I really can't understand.
Here is an approach to fix this error with an extension method instead of modifying the SQL-Query.
public static Decimal MyGetDecimal(this OracleDataReader reader, int i)
{
try
{
return reader.GetDecimal(i);
}
catch (System.InvalidCastException)
{
Oracle.ManagedDataAccess.Types.OracleDecimal hlp = reader.GetOracleDecimal(i);
Oracle.ManagedDataAccess.Types.OracleDecimal hlp2 = Oracle.ManagedDataAccess.Types.OracleDecimal.SetPrecision(hlp, 27);
return hlp2.Value;
}
}
Thank you for this but what happens if your query looks like:
SELECT x.* from x
and .GetDecimal appears nowhere?
Any suggestions in that case? I have created a function in ORACLE itself that rounds all values in a result set to avoid this for basic select statements but this seems wrong for loading updateable datasets...
Obviously this is an old-school approach to getting data.

Sqoop & Hadoop - How to join/merge old data and new data imported by Sqoop in lastmodified mode?

Background:
I have a table with the following schema on a SQL server. Updates to existing rows is possible and new rows are also added to this table.
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-18 19:07:00.0 | 180
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
I am using Sqoop to add incremental updates in lastmodified mode. My --check-column parameter is the last_login_date column. In my first run, I got the above two records into Hadoop - let's call this current data. I noted that the last value (the max value of the the check column from this first import) is 2016-06-18 19:07:00.0.
Assuming there is a change on the SQL server side, I now have the following changes on the SQL server side:
unique_id | user_id | last_login_date | count
123-111 | 111 | 2016-06-25 20:10:00.0 | 200
124-100 | 100 | 2016-06-02 10:27:00.0 | 50
125-500 | 500 | 2016-06-28 19:54:00.0 | 1
I have the row 123-111 updated with a more recent last_login_date value and the count column has also been updated. I also have a new row 125-500 added.
On my second run, sqoop looks at all columns with a last_login_date column greater than my known last value from the previous import - 2016-06-18 19:07:00.0
This gives me only the changed data, i.e. 123-111 and 125-500 records. Let's call this - new data.
Question
How do I do a merge join in Hadoop/Hive using the current data and the new data so that I end up with the updated version of 123-111, 124-100, and the newly added 125-500?
Changed data load using scoop is a two phase process.
1st phase - load changed data into some temp (stage) table using
sqoop import utility.
2nd phase - Merge changed data with old data using sqoop-merge
utility.
If the table is small(say few M records) then use full load using sqoop import.
Sometimes it's possible to load only latest partition - in such case use sqoop import utility to load partition using custom query, then instead of merge simply insert overwrite loaded partition into target table, or copy files - this will work faster than sqoop merge.
You can change the existing Sqoop query (by specifying a new custom query) to get ALL the data from the source table instead of getting only the changed data. Refer using_sqoop_to_move_data_into_hive. This would be the simplest way to accomplish this - i.e doing a full data refresh instead of applying deltas.

execute stored procedure with dbslim with Fitnesse (Selenium,Xebium)

https://github.com/markfink/dbslim
I'd like to execute the stored procedures with DbSlim using Fitnesse (Selenium, Xebium)
now what I tried to do is:
!define dbQuerySelectCustomerbalance (
execute dbo.uspLogError
)
| script | Db Slim Select Query | !-${dbQuerySelectCustomerbalance}-! |
which gives a green indicator,
however Microsoft SQL Server profiler gives no actions/logging...
so what i'd like to know is: is it possible to use dbslim for executing stored procedures,
if yes
what is the correct way to do it?
By the way, the connection to the Database i've on 1 page, and on the query page i included the connection to the database. (is that ok?)
Take out the !- ... -!. It is used to escape wikified words. But in this case you want it to be translated to the actual query.
!define dbQuerySelectCustomerbalance ( execute dbo.uspLogError )
| script | Db Slim Select Query | ${dbQuerySelectCustomerbalance} |
| show | data by column index | 1 | and row index | 1 |
You can add in the last line which outputing the first column of the first row for testing purpose if your SP is returning some result (or you can create one simple SP just to test this out)
Specifying the connection anywhere before this block will be fine, be it on the same page or in an SetUp/SuiteSetUp/normal page included/executed before.

Resources