We are migrating our code from Hive to Snowflake and so hive map is migrated to snowflake variant. However when we loaded data into snowflake table - we are seeing addition KEY and VALUE string in our data.
Hive MAP data -
{"SD10":"","SD9":""}
SnowSQL Variant data -
[ { "key": "SD10", "value": "" }, { "key": "SD9", "value": "" }]
I am using stage and ORC file to load data from Hadoop to Snowflake.
Is there a way we can store the map data as it is into snowflake variant. Basically I don't want additional KEY and VALUE strings
You can do this with the PARSE_JSON function:
-- Returns { "SD10": "", "SD9": "" }
select parse_json('{"SD10":"","SD9":""}') as JSON;
You can add this to your COPY INTO statement. You can see different options for this in the Snowflake example for Parquet files, which is very similar to loading ORC files in Snowflake: https://docs.snowflake.net/manuals/user-guide/script-data-load-transform-parquet.html.
You can also find more information on transformations on load here: https://docs.snowflake.net/manuals/user-guide/data-load-transform.html
You can do this with UDF.
create or replace function list_to_dict(v variant)
returns variant
language javascript
as'
function listToDict(input){
var returnDictionary = {}
var inputLength = input.length;
for (var i = 0; i < inputLength; i++) {
returnDictionary[input[i]["key"]] = input[i]["value"]
}
return returnDictionary
}
return listToDict(V);
';
select list_to_dict(column)['SD10'] from table;
Related
I want to input string data into bigquery by implied by pyhton's zlib library.
Here is an example code that uses zlib to generate data:
import zlib
import pandas as pd
string = 'abs'
df = pd.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
df.append({'id' : 1, 'data' : data}, ignore_index=True)
I've also tried both methods provided by the bigquery API, but both of them give me an error.
The schema is:
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
Examples of methods I have tried are:
1. bigquery API
job = bigquery_client.load_table_from_dataframe(
df, table, job_config=job_config
)
job.result()
2. pandas_gbq
df.to_gbq(detination_table, project_id, if_exists='append')
However, both give similar errors.
1. error
pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)
2. error
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
Is there any way to solve this ?
I want to input python bytestring as bigquery byte data.
Thank you
The problem isn't coming from the insertion of your zlib compressed data. The error occurs on the insertion of the value of your key id which is dataframe value 1 to the NUMERIC data type in BigQuery.
The easiest solution for this is to change the datatype of your schema in BigQuery from NUMERIC to INTEGER.
However, if you really need your schema to be in NUMERIC datatype, you may convert the dataframe datatype of 1 on your python code using decimal library as derived from this SO post before loading it to BigQuery.
You may refer to below sample code.
from google.cloud import bigquery
import pandas
import zlib
import decimal
# Construct a BigQuery client object.
client = bigquery.Client()
# Set table_id to the ID of the table to create.
table_id = "my-project.my-dataset.my-table"
string = 'abs'
df = pandas.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
record = df.append({'id' : 1, 'data' : data}, ignore_index=True)
df_2 = pandas.DataFrame(record)
df_2['id'] = df_2['id'].astype(str).map(decimal.Decimal)
dataframe = pandas.DataFrame(
df_2,
# In the loaded table, the column order reflects the order of the
# columns in the DataFrame.
columns=[
"id",
"data",
],
)
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
job = client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
) # Make an API request.
job.result() # Wait for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
OUTPUT:
My Data is follows:
[ {
"InvestorID": "10014-49",
"InvestorName": "Blackstone",
"LastUpdated": "11/23/2021"
},
{
"InvestorID": "15713-74",
"InvestorName": "Bay Grove Capital",
"LastUpdated": "11/19/2021"
}]
So Far Tried:
CREATE OR REPLACE TABLE STG_PB_INVESTOR (
Investor_ID string, Investor_Name string,Last_Updated DATETIME
); Created table
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
created file format
create or replace stage investor_stage
file_format = investorformat;
created stage
copy into STG_PB_INVESTOR from #investor_stage
I am getting an error:
SQL compilation error: JSON file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column.
You should be loading your JSON data into a table with a single column that is a VARIANT. Once in Snowflake you can either flatten that data out with a view or a subsequent table load. You could also flatten it on the way in using a SELECT on your COPY statement, but that tends to be a little slower.
Try something like this:
CREATE OR REPLACE TABLE STG_PB_INVESTOR_JSON (
var variant
);
create or replace file format investorformat
type = 'JSON'
strip_outer_array = true;
create or replace stage investor_stage
file_format = investorformat;
copy into STG_PB_INVESTOR_JSON from #investor_stage;
create or replace table STG_PB_INVESTOR as
SELECT
var:InvestorID::string as Investor_id,
var:InvestorName::string as Investor_Name,
TO_DATE(var:LastUpdated::string,'MM/DD/YYYY') as last_updated
FROM STG_PB_INVESTOR_JSON;
I have an external table INV_EXT_TBL( 1 variant column named "VALUE") in Snowflake and it has 6000 rows (each row is json file). The json record has has double quote as its in dynamo_json format.
What is the best approach to parse all the json files and convert it into table format to run sql queries. I have given sample format of just top 3 json files.
"{
""Item"": {
""sortKey"": {
""S"": ""DR-1630507718""
},
""vin"": {
""S"": ""1FMCU9GD2JUA29""
}
}
}"
"{
""Item"": {
""sortKey"": {
""S"": ""affc5dd0875c-1630618108496""
},
},
""vin"": {
""S"": ""SALCH625018""
}
}
}"
"{
""Item"": {
""sortKey"": {
""S"": ""affc5dd0875c-1601078453607""
},
""vin"": {
""S"": ""KL4CB018677""
}
}
}"
I created local table and inserted data into it from external table by casting the data type. Is this correct approach OR should i use parse_json function against the json files to store data in local table.
insert into DB.SCHEMA.INV_HIST(VIN,SORTKEY)
(SELECT value:Item.vin.S::string AS VIN, value:Item.sortKey.S::string AS SORTKEY FROM INV_EXT_TBL);```
I resolved this by creating a Materialized view by using cast on variant column on External table. This helped to get rid of outer double-quotes and the performance improved multifold. I did not progress with table creation approach.
CREATE OR REPLACE MATERIALIZED VIEW DB.SCHEMA.MVW_INV_HIST
AS
SELECT value:Item.vin.S::string AS VIN, value:Item.sortKey.S::string AS SORTKEY
FROM DB.SCHEMA.INV_HIST;
So I managed to load data by the snowflake CLI, but I want to automatize this.
From what I read, I can load data using SQL statements (My table is, at the moment, with 1 column: V VARIANT) and I'm loading data like this:
order => {
connection.execute({
sqlText: `INSERT INTO "xx"."xx"."xx" VALUES(${order})` //Also tried only with the tablename
}
But when I query everything from the table, it's empty. Why is that?
I have a SQLite database in one system, I need to extract the data stored in SQLite to Oracle database. How do I do this?
Oracle provides product called the Oracle Database Mobile Server (previously called Oracle Database Lite) which allows you to synchronize between a SQLite and an Oracle database. It provides scalable bi-directional sync, schema mapping, security, etc. The Mobile Server supports both synchronous and asynchronous data sync. If this is more than a one-time export and you need to keep your SQLite and Oracle Databases in sync, this is a great tool!
Disclaimer: I'm one of the Product Managers for Oracle Database Mobile Server, so I'm a bit biased. However, the Mobile Server really is a great tool to use for keeping your SQLite (or Berkeley DB) and Oracle Databases in sync.
You'll have to convert the SQLite to a text file (not certain of the format) and then use Oracle to load the database from text (source is http://www.orafaq.com/wiki/SQLite). You can use the .dump command from the SQLite interactive shell to dump to a text file (see the docs for syntax).
SQL Loader is a utility that will read a delimited text file and import it into an oracle database. You will need to map out how each column from your flat file out of sqlite matches to the corresponding one in the Oracle database. Here is a good FAQ that should help you get started.
If you are a developer, you could develop an application to perform the sync. You would do
SELECT name FROM sqlite_master WHERE type='table'
to get the table names, then you could re-create them in Oracle (you can do DROP TABLE tablename in Oracle first, to avoid a conflict, assuming SQLite will be authoritative) with CREATE TABLE commands. Getting the columns for each one takes
SELECT sql FROM sqlite_master WHERE type='table' and name='MyTable'
And then you have to parse the result:
string columnNames = sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').replace(/ [^,]+/g, '').split(',');
string[] columnArray = columnNames.Split(',');
foreach (string s in columnArray)
{
// Add column to table using:
// ALTER TABLE MyTable ADD COLUMN s NVARCHAR(250)
}
A StringBuilder can be used to collect the table name with its columns to create your INSERT command. To add the values, it would just be a matter of doing SELECT * FROM MyTable for each of the tables during your loop through the table names you got back from the initial query. You would iterate the columns of the rows of the datatable you were returned and add the values to the StringBuilder:
INSERT INTO MyTable ( + columnA, columnB, etc. + ) VALUES ( datarow[0], datarow[1], etc. + ).
Not exactly like that, though - you fill in the data by appending the column name and its data as you run through the loops. You can get the column names by appending s in that foreach loop, above. Each column value is then set using a foreach loop that gives you each object obj in drData.ItemArray. If all you have are string fields, it's easy, you just add obj.ToString() to your StringBuilder for each column value in your query like I have below. Then you run the query after collecting all of the column values for each row. You use a new StringBuilder for each row - it needs to get reset to INSERT INTO MyTable ( + columnA, columnB, etc. + ) VALUES ( prior to each new row, so the new column values can be appended.
If you have mixed datatypes (i.e. DATE, BLOB, etc.), you'll need to determine the column types along the way, store it in a list or array, then use a counter to determine the index of that list/array slot and get the type, so you know how to translate your object into something Oracle can use - whether that means simply adding to_date() to the result, with formatting, for a date (since SQLite stores these as date strings with the format yyyy-MM-dd HH:mm:ss), or adding it to an OracleParameter for a BLOB and sending that along to a RunOracleCommand function. (I did not go into this, below.)
Putting all of this together yields this:
string[] columnArray = null;
DataTable dtTableNames = GetSQLiteTable("SELECT name FROM sqlite_master WHERE type='table'");
if (dtTableNames != null && dtTableNames.Rows != null)
{
if (dtTableNames.Rows.Count > 0)
{
// We have tables
foreach (DataRow dr in dtTableNames.Rows)
{
// Do everything about this table here
StringBuilder sb = new StringBuilder();
sb.Append("INSERT INTO " + tableName + " ("); // we will collect column names here
string tableName = dr["NAME"] != null ? dr["NAME"].ToString() : String.Empty;
if (!String.IsNullOrEmpty(tableName))
{
RunOracleCommand("DROP TABLE " + tableName);
RunOracleCommand("CREATE TABLE " + tableName);
}
DataTable dtColumnNames = GetSQLiteTable("SELECT sql FROM sqlite_master WHERE type='table' AND name='"+tableName+"'");
if (dtColumnNames != null && dtColumnNames.Rows != null)
{
if (dtColumnNames.Rows.Count > 0)
{
// We have columns
foreach (DataRow drCol in dtTableNames.Rows)
{
string sql = drCol["SQL"] != null ? drCol["SQL"].ToString() : String.Empty;
if (!String.IsNullOrEmpty(sql))
{
string columnNames = sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').replace(/ [^,]+/g, '').split(',');
columnArray = columnNames.Split(',');
foreach (string s in columnArray)
{
// Add column to table using:
RunOracleCommand("ALTER TABLE " + tableName + " ADD COLUMN " + s + " NVARCHAR(250)"); // can hard-code like this or use logic to determine the datatype/column width
sb.Append("'" + s + "',");
}
sb.TrimEnd(",");
sb.Append(") VALUES (");
}
}
}
}
// Get SQLite Table data for insertion to Oracle
DataTable dtTableData = GetSQLiteTable("SELECT * FROM " + tableName);
if (dtTableData != null && dtTableData.Rows != null)
{
if (dtTableData.Rows.Count > 0)
{
// We have data
foreach (DataRow drData in dtTableData.Rows)
{
StringBuilder sbRow = sb; // resets to baseline for each row
foreach (object obj in drData.ItemArray)
{
// This is simplistic and assumes you have string data for an NVARCHAR field
sbRow.Append("'" + obj.ToString() + "',");
}
sbRow.TrimEnd(",");
sbRow.Append(")");
RunOracleCommand(sbRow.ToString());
}
}
}
}
}
}
All of this assumes you have a RunOracleCommand() void function that can take a SQL command and run it against an Oracle DB, and a GetSQLiteTable() function that can return a DataTable from your SQLite DB by passing it a SQL command.
Note that this code is untested, as I wrote it directly in this post, but it is based heavily on code I wrote to sync Oracle into SQLite, which has been tested and works.