I have set up snow pipe and it reads from S3 (JSON File) and for some reason it has paused suddenly, when I queried COPY_HISTORY table it has error saying "NULL result in a non-nullable column"
My JSon File :only the first Key-value pair is not nullable, rest can be null.
I have checked the entire JSON file, but couldnt find where it is null. Can some one tell me how to pin point the exact data where It is failng?
Thanks
I could see two options to troubleshoot this problem
One Option is Create External Table for this S3 Json File and build View on top of it to view the Json data , check possible null values coming from Json key-value pair elements
If you think your JSON File is heavy and viewing data out of External Table using View is not giving the expected query performance to troubleshoot this issue then Second Option is to load this file into your internal stage and then build view on top of it to analyze your null key value pair elements.
Related
I am trying to setup an ELT pipeline into Snowflake and it involves a transformation after loading.
This transformation will currently create or replace a Table using data queried from a source table in Snowflake after performing some manipulations of JSON data.
My question is, is this the proper way of doing it via create or replace Table everytime the transformation runs or is there a way to update the data in the transformed table incrementally?
Any advise would be greatly appreciated!
Thanks!
You can Insert into the load (soruce) table, and put into a stream, then you can know the rows, ranges of rows that need to be "reviewed" and then upsert into the output transform table.
That is is you doing something like "daily aggregates", thus if in "this batch you have data for the last 4 days, you then read the "last four days" of data from source (space a full read) and then aggregate and upsert via merge command. Thus with the model you can save reads/aggregate/write.
We have also used high water tables, to know last seen data, and/or lowest value in current batch.
this is the first time I try to load a single line json file into snowflake via external table.
the files are around 60MB and stored on s3. it contains nest records, arrays and no newline character.
Right now I cannot see the data from the external table, However, if the file is small enough like 1MB the external table works fine.
The closest solution I can find is this, but it doesn't provide me sufficient answer.
The file can be in a much bigger size and I have no control of the files.
Is there a way to fix this issue?
thanks!
Edit:
Look like the only tangible solution is to make the file smaller, as the json file is not ndjson. What JSON format does STRIP_OUTER_ARRAY support?
If your nested records are wrapped in an outer array, you can potentially use STRIP_OUTER_ARRAY on the Stage file definition to remove them so each JSON element within the outer array gets loaded as a record. You can include use METADATA$FILENAME in the table definition to derive which file the elements came from.
Using the REST connector in Azure Data Factory, I am trying to fetch the Facebook campaign details.
In the pipeline, I have a web activity followed by copy activity. In the mapping section, I can see only the three columns (Id, name, status) from the first array and not getting those columns listed inside the second array.
graph.facebook.com
data factory mapping
Is there a way to get the columns listed inside the array? I also tried creating a data flow taking the Json file as source and then used the flatten transformation, still I cannot see the columns related to campaigns. Any help is appreciated. Thanks again.
I tested and find that Data Factory will consider the first object/JSON array as the JSON schema.
If you can adjust the JSON data, then the "insights" can be recognized:
Schema:
If you can't, then the "insights" column will be missed:
In this case, there isn't a way to get all the columns listed inside the array.
HTH.
#Leon Yue, I found a way to do that.
Step 1: 1. Copy Facebook campaign data using the REST connector and save as JSON in Azure Blob.
Step 1: copy activity to extract FB data as JSON and save in Blob
Step 2: 2. Create a Data flow, considering the JSON file from Blob as the source.
Step 2: Data flow task
Step 3: Create a JSON schema and save it in your desktop with the insights array in the first row(which has all the column values) As per your previous comments, I created the JSON schema such that ADF will consider the first object/ JSON array as the JSON schema.
Step 4: In the Data flow - Source dataset, map the JSON schema using the 'Import schema' option from sample file.
Step 4: Import schema
Now you will be able to see all the columns from the array.
All columns
Flatten JSON
While running a mapping I am getting couple of database errors and jobs failed
1.) Arithmetic Overflow error
2.) Conversion failed when converting date and/or time from character string.
This is purely data issue(datatype error and data length issue) and I want to reject these records and write it in a separate error table.
The .bad files in which these records are written consists of characters which looks like junk (',N,N,N,N' AND ',D' AND ',0'), I am not sure on what basis we get these characters.
Do we get this for null values? and how to overcome this and get the exact output?
Is it possible to write these rejected records directly to a relation table(error table with same structure as the target table) or a way around to achieve this?
You could use a router transformation to route every field which does not meet your criteria to the error table. This way you will handle them before they become bad rows.
Hey Vankat just lookig at you problem try to filter out the records which doesn't meet your criteria by putting condition like (data type,length)at router transformation and route them to error table or capture that recods in a flat file. Hope this will give you a clear picture.
I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.