I have a table with three fields (id UInt64, meta JSON, data JSON). There are two separate files from which I want to ingest data into meta and data fields, for eg. meta.json and data.json. How can I achieve this using FROM file()?
Also how to refer to these file once we enter inside clickhouse client shell using clickhouse-client --password?
Related
I am unable to load the XML data using copy data activity in sql server DB, but am able to achieve this with data flows using flatten hierarchy , while mapping the corresponding array is not coming properly in copy data even pipeline success also only partial data is loading in DB.
and auto creation of table is also not allowing while doing copy activity for XML file , has to create table script first and load the data ...
as we are using SHIR this activity should be done in using copy data activity.
Use the collection reference in mapping tab of copy activity to unroll by and extract the data. I repro'd this using copy activity with sample nested XML data.
img1: source dataset preview.
In mapping tab, Select Import schemas
Toggle on the Advanced editor
Give the json path of the array from which data needs to be iterated and extracted.
img:2 Mapping settings.
When pipeline is run, data is copied successfully to database.
img:3 sink data after copying.
Reference : MS document on hierarchical source to tabular sink.
I am trying to get the details of the customer like photograph, adhar, and other identity and bank details in pdf format. I wanted these details to be saved in PostgreSQL. Which is the best format I can store in DB? I will also have access to those files in DB whenever I need them. There won't be much update process in the files though.
I have the file in my Golang server in multipart.File type.
type DocumentsAll struct{
Photograph multipart.File `json:"photograph"`
Identityproof multipart.File `json:"identityproof"`
}
this is the struct I have that contains all files
My column in db is as follows
ALTER TABLE LOANS ADD column Documentlist bytea;
ie column that contains documents is of data type bytea.
when i try to save document with these format this is what is getting stored in db
\x7b2270686f746f6772617068223a7b7d2c22706f69223a6e756c6c2c22706f72223a6e756c6c2c2273616c617279736c697073223a6e756c6c2c2262616e6b73746174656d656e74223a6e756c6c2c2263686571756573223a6e756c6c2c22706f6a223a6e756c6c2c22706f6f223a6e756c6c2c22706f71223a6e756c6c2c226c6f616e41677265656d656e74223a6e756c6c2c227369676e766572696679223a6e756c6c2c22656373223a6e756c6c2c227365637572697479636865717565223a6e756c6c2c22706f74223a6e756c6c2c22697472223a6e756c6c2c226c6f616e73746174656d656e74223a6e756c6c7d
So when I try to access it back, I am not able to retrieve the file.
We have staged the log files in external stage s3.The staged log files are in CEF file format.How to parse CEF files from stage to move the data to snowflake?
If the files have a fixed format (i.e. there are record and field delimiters and each record has the same number of columns) then you can just treat it as a text file and create an appropriate file format.
If the file has a semi-structured format then you should be able to load it into a variant column - whether you can create multiple rows per file or only one depends in the file structure. If you can only create one record per file then you may run into issues with file size as a variant column has a maximum file size.
Once the data is in a variant column you should be able to process it to extract usable data from it. If there is a structure Snowflake can process (e.g. xml or json) then you can use the native capabilities. If there is no recognisable structure then you'd have to write your own parsing logic in a stored procedure.
Alternatively, you could try and find another tool that will convert your files to an xml/json format and then Snowflake can easily process those files.
Using the REST connector in Azure Data Factory, I am trying to fetch the Facebook campaign details.
In the pipeline, I have a web activity followed by copy activity. In the mapping section, I can see only the three columns (Id, name, status) from the first array and not getting those columns listed inside the second array.
graph.facebook.com
data factory mapping
Is there a way to get the columns listed inside the array? I also tried creating a data flow taking the Json file as source and then used the flatten transformation, still I cannot see the columns related to campaigns. Any help is appreciated. Thanks again.
I tested and find that Data Factory will consider the first object/JSON array as the JSON schema.
If you can adjust the JSON data, then the "insights" can be recognized:
Schema:
If you can't, then the "insights" column will be missed:
In this case, there isn't a way to get all the columns listed inside the array.
HTH.
#Leon Yue, I found a way to do that.
Step 1: 1. Copy Facebook campaign data using the REST connector and save as JSON in Azure Blob.
Step 1: copy activity to extract FB data as JSON and save in Blob
Step 2: 2. Create a Data flow, considering the JSON file from Blob as the source.
Step 2: Data flow task
Step 3: Create a JSON schema and save it in your desktop with the insights array in the first row(which has all the column values) As per your previous comments, I created the JSON schema such that ADF will consider the first object/ JSON array as the JSON schema.
Step 4: In the Data flow - Source dataset, map the JSON schema using the 'Import schema' option from sample file.
Step 4: Import schema
Now you will be able to see all the columns from the array.
All columns
Flatten JSON
I need to get values from a certain column in a xlsx spreadsheet that was uploaded to my database in a image(blob) field. I would like to step through the rows and get values from say column 4 and insert the values into another table by using sqlserver. I can to it with CSV files by casting the image field to varbinary and then cast it again to varhar and search for ','s.
Can openrowset work on a blob field?
I doubt that this can work. Even though the data in the XLSX is stored in Microsofts Office Open-XML format (http://en.wikipedia.org/wiki/Office_Open_XML) the XML is then zipped which means that your XLSX file is a binary file. So if you want to access data in the xlsx (can't you use csv instead?) I think you need to do so programmatically. Depending on the programming logic of your choice there are various open-source projects allowing you to access xlsx file.
Java: Apache POI http://poi.apache.org/spreadsheet/
C++: http://sourceforge.net/projects/xlslib/?source=directory
...