Demo Db and UTIL Db in snowflake - snowflake-cloud-data-platform

When we create a new snowflake account, it comes with two blank databases DEMO_DB and UTIL_DB. I always wonder what is the reason for providing two blank databases. Would not DEMO_DB be enough? Does anyone know why the util_db is required?

As you can see from the comment of the database, UTIL_DB is an "utility database".
As far as I know, DEMO_DB does not contain any file format objects when the account is created. Sample file format objects (ie: single_column_rows, csv, csv_dq, tsv, psv and more...) are created under UTIL_DB. I'm not talking about file format types; these are sample file format objects!
Additionally, it contains two utility functions:
SELECT * FROM TABLE(util_db.public.sfwho());
which returns a row containing the current timestamp, account name, user, role, database, schema and warehouse information.
SELECT util_db.public.decode_uri('https%3A%2F%2Fgokhanatil.com');
which decodes an encoded URI (calling the decodeURIComponent function):
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent

Related

How to read from one DB but write to another using Snowflake's Snowpark?

I'm SUPER new to Snowflake and Snowpark, but I do have respectable SQL and Python experience. I'm trying to use Snowpark to do my data prep and eventually use it in a data science model. However, I cannot write to the database from which I'm pulling from -- I need to create all tables in a second DB.
I've created code blocks to represent both input and output DBs in their own sessions, but I'm not sure that's helpful, since I have to be in the first session in order to even get the data.
I use code similar to the following to create a new table while in the session for the "input" DB:
my_table= session.table("<SCHEMA>.<TABLE_NAME>")
my_table.toPandas()
table_info = my_table.select(col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>").alias("<new_name>"),
col("<col_name4"),
col("<col_name5")
)
table_info.write.mode('overwrite').saveAsTable('MAINTABLE')
I need to save the table MAINTABLE to a secondary database that is different from the one where the data was pulled from. How do I do this?
It is possible to provide fully qualified name:
table_info.write.mode('overwrite').saveAsTable('DATABASE_NAME.SCHEMA_NAME.MAINTABLE')
DataFrameWriter.save_as_table
Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).

How does SQLITE DB saves data of multiple tables in a single file?

I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.
The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema

Export Azure SQL Database to XML File

I have searched Google and this site for about 2 hours trying to gather how to do this and no luck on a way that fits/ I understand. As the title says, I need to export table data to an XML file. I have an Azure SQL database with table data.
Table name: District
Table Columns: Id, name, organizationType, address, etc.
I need to take this data and create a XML file that I can save so that it can be given to others.
I have tried using:
SELECT *
FROM dbo.District
FOR XML PATH('districtEntry'), ROOT('leaID')
It gives me the data in XML format, but I don't see a way to save it.
Also, there are some functions I need to be able to perform with the data:
Program should have these options:
1) Export all data.
2) Export all rows created or updated since a specified date.
Files should be named in format ENTITY.DATE.XML, as in
DISTRICT.20150521.XML (use date in YYYYMMDD format).
This leads me to believe I need to write code other than SQL since a requirement would be to query the table for certain data elements as well.
I was wondering if I would need to download any Database Server Data Tools, write code, and if so, in what language, etc. The XML file creation would need to be automated I believe after every update of the table or after a query.
I am very confused and in need of guidance as I now have almost given up hope. Please let me know if I need to clarify anything. Thank you.
P.S. I would have given pictures but I do not have enough reputation to supply them.
I would imagine you're looking to write a program in VB.NET or C#, using ADO.NET in either case. Here's an MSDN article with a complete sample of how to connect to and query SQL Azure:
https://msdn.microsoft.com/en-us/library/azure/ee336243.aspx
The example shows how to write the output to the Console, but you could also write the output similarly using something like a StreamWriter to write it to a file.
You could also create a sqlcmd script to do this, following the guidelines here to connect using sqlcmd:
https://msdn.microsoft.com/en-us/library/azure/ee336280.aspx
Alternatively, if this is a process that does not need to be automated or repeated frequently, you could do it using SSMS:
http://azure.microsoft.com/en-us/documentation/articles/sql-database-manage-azure-ssms/
Running your query through SSMS would produce an XML document, which could be saved using File->Save As

Copying tables between databased with different authentication DB2

Hey StackOverflow community,
My question is as follows:
I have a table, say USER_ADDR with a bunch of columns in one database, say DB001
I need to copy the contents of this table(based on a criteria) to a similar table USER_ADDR (same name, yes) in another database DB002 with a different userID and pwd.
I need to do this in a stored procedure that will be executed using a .net framework.
I tried this:
INSERT INTO "DB002".USER_ADDR (--column names--)
SELECT *
FROM "DB001".USER_ADDR
WHERE ID = "APPLICATION_NO_IN";
I get:
0: Error occurred: [IBM][DB2/NT64] SQL0204N "DB002.USER_ADDR" is an undefined name. LINE NUMBER=15. SQLSTATE=42704 : -204: IBM.Data.DB2: 42704
What am I doing wrong?
Thanks in advance
Vashist
i'm deleting my other answer after seeing the additional info about your use case. Load is mainly for bulk loads of large numbers of records.
in this case i'd recommend you do something like open connection1 in .Net to your data source, select the data and hold it in a .Net DataTable. If required, you can do that select in a stored proc that returns either individual column values for a single row or return a cursor (rowset) that contains all the columns (and rows). Then in .Net open connection2 and insert the data from the DataTable to your destination. Again, that can be done with a stored proc.
Another approach is using an external script that connects to both databases.
From just one database is not possible, at least you use, as already mentioned, Information integration (federation) or by exporting the data and then loading it.

SQL2008 Integration Services - Loading CSV files with varying file schema

I'm using SQL2008 to load sensor data in a table with Integration Services. I have to deal with hundreds of files. The problem is that the CSV files all have slightly different schemas. Each file can have a maximum of 20 data fields. All data files have these fields in common. Some files have all the fields others have some of the fields. In addition, the order of the fields can vary.
Here’s and example of what the file schemas look like.
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,RD_1,SH_1,CL_2
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,WS_1,WD_1,WSM_1,WDM_1,SH_1
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,RS_1,RI_1,PR_1,RD_1,WS_1,WD_1,WSM_1,WDM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,PW_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,WS_1,WD_1,WSM_1
I’m using a Data Flow Script Task to process the data via CreateNewOutputRows() and MyOutputBuffer.AddRow(). I have a working package to load the data however it’s not reliable and robust because as I had more files the package fails because the file schema has not been defined in CreateNewOutputRows().
I'm looking for a dynamic solution that can cope with the variation in the file schema. Doeas anyone have any ideas?
Who controls the data model for the output of the sensors? If it's not you, do they know what they are doing? If they create new and inconsistent models every time they invent a new sensor, you are pretty much up the creek.
If you can influence or control the evolution of the schemas for CSV files, try to come up with a top level data architecture. In the bad old days before there were databases, files made up of records often had, as the first field of each record, a "record type". CSV files could be organized the same way. The first field of every record could indicate what type of record you are dealing with. When you get an unknown type, put it in the "bad input file" until you can maintain your software.
If that isn't dynamic enough for you, you may have to consider artificial intelligence, or looking for a different job.
Maybe the cmd command is good. in the cmd, you can use sqlserver import csv.
If the CSV files that all have identical formats use the same file name convention or if they can be separated out in some fashion you can use the ForEach Loop Container for each file schema type.
Possible way to separate out the CSV files is run a Script (in VB) in SSIS that reads the first row of the CSV file and checks for the differing types (if the column names are in the first row) and then moves the files to the appropriate folder for use in the ForEach Loop Container.

Resources