How to read from one DB but write to another using Snowflake's Snowpark? - snowflake-cloud-data-platform

I'm SUPER new to Snowflake and Snowpark, but I do have respectable SQL and Python experience. I'm trying to use Snowpark to do my data prep and eventually use it in a data science model. However, I cannot write to the database from which I'm pulling from -- I need to create all tables in a second DB.
I've created code blocks to represent both input and output DBs in their own sessions, but I'm not sure that's helpful, since I have to be in the first session in order to even get the data.
I use code similar to the following to create a new table while in the session for the "input" DB:
my_table= session.table("<SCHEMA>.<TABLE_NAME>")
my_table.toPandas()
table_info = my_table.select(col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>").alias("<new_name>"),
col("<col_name4"),
col("<col_name5")
)
table_info.write.mode('overwrite').saveAsTable('MAINTABLE')
I need to save the table MAINTABLE to a secondary database that is different from the one where the data was pulled from. How do I do this?

It is possible to provide fully qualified name:
table_info.write.mode('overwrite').saveAsTable('DATABASE_NAME.SCHEMA_NAME.MAINTABLE')
DataFrameWriter.save_as_table
Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).

Related

How does SQLITE DB saves data of multiple tables in a single file?

I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.
The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema

Need a clever way to get orders from all stores while each store is in a different database

The setup
I have the following database setup:
CentralDB
Table: Stores
Table: Users
Store1DB
Table: Orders
Store2DB
Table: Orders
Store3DB
Table: Orders
Store4DB
Table: Orders
... etc
CentralDB contains the users, logging and a Stores table with the name of each store database and general information about each store such as address, name, description, image, etc...
All the StoreDB's use the same structure just different data.
It is important to know that the list of stores will shrink and increase in the future.
The main client communicating with this setup is an API REST Service which gets passed a STOREID in the Header of each request telling it which database to connect to. This works flawlessly so far.
The reasoning
Whenever we need to do database maintenance on one store, we don't want all other stores to be down.
Backup management should be per store
Not having to write the WHERE storeID=x every time and for every table
Performance: each store could run on its own database server if the need arises
The goal
I need my REST API Service to somehow get all orders from all stores in one query.
Will you help me figure out a way to do this without hardcoding all storedb names? I was thinking about a stored procedure on the CentralDB but I was hoping there would be other solutions. In any case it has to be very efficient.
One option would be to have a list of databases stored in a "system" table in CentralDB.
Then you could create a stored procedure that would read the database names from the table, loop through them with cursor and generate a dynamic SQL that would UNION the results from all the databases. This way you would get a single recordset of results.
However, this database design is IMHO flawed. There is no reason for using multiple databases to store data that belongs to the same "domain". All the reasons that you have mentioned can be solved by using a single database with proper database design. Having multiple databases will create multiple problems on the long term:
you will need to change structure of all the DBs when you modify your database model
you will need to create/drop new databases when new stores are added/removed from your system
you will need to have items and other entities that are "common" to all the stores duplicated in all the DBs
what about reporting requirements (e.g. get sales data for stores 1 and 2 together, etc.) - this will require creating complex union queries...
etc...
On the long term, managing and maintaining this model will be a big pain.
I'd maintain a set of views that UNION ALL all the data. Every time a store is added or deleted those views must be updated. This can be automated.
The views provide an illusion to the application that there is only one database.
What I would not do is have each SQL query or procedure query all the database names and create dynamic SQL. That would entail lots of code duplication and an unnecessary loss of performance. This approach is error prone. Better generate code once in a central place and have all other SQL code reference that generated code.

Web2Py - Create table from user input

I'm trying to create a table in a database in web2py. I'm new to this and trying to get a hold of the MVC structure and how to call in between.
What I have done is in /modles/db.py I created a DB:
TestDB = DAL("sqlite://storage.sqlite")
Then in my /controllers/default.py I have:
def index():
form = FORM(INPUT(_name='name', requires=IS_NOT_EMPTY()),
INPUT(_type='submit'))
if form.process().accepted:
TestDB().define_table(form.vars.name, Field('testField', unique=True))
return dict(form=form)
return dict(form=form)
But this isn't working. Could somebody help me with understanding how to achieve this?
Thank you.
First, to define the table, it would be TestDB.define_table(...), not TestDB().define_table(...).
Second, table definitions don't persist across requests (of course, the database tables themselves persist, but the DAL table definitions do not). So, if you define a table inside the index() function, that is the only place it will exist and be accessible in your app. If you want to access the table elsewhere in your app, you'll need to store the metadata about the table definition (in this case, just the table name) somewhere, perhaps in the database or a file. You would then retrieve that information on each request (and probably cache it to speed things up) and use it to create the table definition.
Another option would be to generate the table definition code and append it to a model file so it will automatically be run on each request. This is roughly how the new application wizard in the "admin" app works when generating a new application.
Finally, you could leave the table definition in the index() function as you have it, and then when you create the database connection on each request, you can use auto_import:
TestDB = DAL("sqlite://storage.sqlite", auto_import=True)
That will automatically create the table definitions for all tables in TestDB based on the metadata stored in the application's /databases/*.table files. Note, the metadata only includes database-specific metadata, such as table and field names and field types -- it does not include web2py-specific attributes, such as validators, default values, compute functions, etc. So, this option is of limited usefulness.
Of course, all of this has security implications. If you let users define tables and fields, a particular submission could mistakenly or maliciously alter existing database tables. So, you'll have to do some careful checking before processing user submissions.

Copying tables between databased with different authentication DB2

Hey StackOverflow community,
My question is as follows:
I have a table, say USER_ADDR with a bunch of columns in one database, say DB001
I need to copy the contents of this table(based on a criteria) to a similar table USER_ADDR (same name, yes) in another database DB002 with a different userID and pwd.
I need to do this in a stored procedure that will be executed using a .net framework.
I tried this:
INSERT INTO "DB002".USER_ADDR (--column names--)
SELECT *
FROM "DB001".USER_ADDR
WHERE ID = "APPLICATION_NO_IN";
I get:
0: Error occurred: [IBM][DB2/NT64] SQL0204N "DB002.USER_ADDR" is an undefined name. LINE NUMBER=15. SQLSTATE=42704 : -204: IBM.Data.DB2: 42704
What am I doing wrong?
Thanks in advance
Vashist
i'm deleting my other answer after seeing the additional info about your use case. Load is mainly for bulk loads of large numbers of records.
in this case i'd recommend you do something like open connection1 in .Net to your data source, select the data and hold it in a .Net DataTable. If required, you can do that select in a stored proc that returns either individual column values for a single row or return a cursor (rowset) that contains all the columns (and rows). Then in .Net open connection2 and insert the data from the DataTable to your destination. Again, that can be done with a stored proc.
Another approach is using an external script that connects to both databases.
From just one database is not possible, at least you use, as already mentioned, Information integration (federation) or by exporting the data and then loading it.

Dynamic SQL statement return value using the current target connection

I'm currently creating my first real life project in Pervasive. The task is to map a certain XML structure containing orders (as in shops and products) to 3 tables I created myself. These tables rest inside a MS-SQL-Server instance.
All of the tables have a unique key called "id", an automatically incremented column. I've dropped this column from all mappings so that Pervasive will not try to fill it itself.
For certain calculations, for a split key in one of the tables and for references to the created records in other tables, I will need the id that the database has just created. For that, I have googled the answer. I can use "select ##identity;" as a statement, and this returns the id that has most recently been created for the current connection. This means that in Pervasive, I will have to execute this statement using the already existing target connection object.
But how to do that? I am quite sure that I will need a JDImport or DJExport object, but how to get one associated with the current connection that Pervasive inserts the records by?
Or is there any other way to handle this auto increment when I need to reference the id in other tables?
Not sure how things work in Pervasive, but you may run into issues with ##identity,. Scope_identity() would probably be safer but may still not work in Pervasive.
Hopefully your tables have a natural key in addition to the generated id, in which case you can select your id based on the natural key. This will avoid any issues you may have with disparate sessions and scope.
If there is anyone looking this post up and wonders about the answer, it's "You can't". Pervasive does not allow access to their very own connection object, the one they use to query the database. Without access to it, you cannot guaranteed fetch the right id. The solution for us was this: We used a stored procedure which we called in the Before-Transformation event that created the header record and returned the id and an optional error message as a table. We executed it and it returns the id we then save and use throughout our mapping.

Resources