My ARCHIVE db in Snowflake already has SCHEMA_X. Few days ago new objects got added to SCHEMA_X in PROD db. What would be good practice to move only new/changed objects rather then entire schema to ARCHIVE db?
My plan is as follows:
SQL query to define new elements in PROD db.
SQL query to compare existing objects in both schemas for differences.
Clone results from above points to ARCHIVE
Any one has more automated approach maybe?
Clone the entire schema that should be quick and upto date.
Note- Cloning does not clones external tables n internal stages.
I would say entire schema is better - because problems start when on top of adding objects you would end up modifying for example. If you're only storing the structure, not the data itself, the space needed to store that is pretty much none compared to the actual data (unless you end up changing the schema every 10 minutes)
Related
I'm given a data source monthly that I'm parsing and putting into a MongoDB database. Each month, some of the data will be updated and some new entries will be added to the existing collections. The source file is a few gigabytes big. Apart from these monthly updates, the data will not change at all.
Eventually, this database will be live and I want to prevent having any downtime during these monthly updates if possible. What is the best way to update my database without any downtime?
This question is basically exactly what I'm asking, but not for a MongoDB database. The accepted answer there is to upload a new version of the database and then rename the new database to use the old one's name.
However, according to this question, it is impossible to easily rename a MongoDB database. This renders that approach unusable.
Intuitively, I would try to iteratively 'upsert' the entire database using each document's unique 'gid' identifier (this is a property of the data, as opposed to the "_id" generated by MongoDB) as a filter, but this might be an inefficient way of doing things.
I'm running MongoDB version 4.2.1
Why do you think updating the data would mean downtime?
It sounds like you don't want your users to be able to access the new data mid-load.
If this is the case, a strategy could be to have 2 databases; a live and a staging; rather than renaming the staging database to live, you could just rename the connection string in the client application(s) that connect to it.
Also consider mongodump and mongorestore to copy databases; although these can be slower with larger databases.
I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.
An application runs training sessions. Environment for each session (like "mission" or "level" in games) is stored in a database.
Before starting a session, user can choose which of many available databases to use.
During the session database may be modified.
After the session changed database is usually discarded, but sometimes may be saved under new or same name.
Databases are often copied between non-connected computers (on a flash card).
If environment were stored in plain files, it would be easy: copy, load, save.
We currently use similar approach: store databases as MS SQL backups, copy and save them as files, and load into actual DBMS when session starts. Main problem is modification: when database schema changes, all the backups must be updated, which is error-prone.
Storing everything in a single database with additional "environment id" relationship and providing utilities to load, save and copy environments seems too complex for the task.
What are other possible ways to design for that functionality? This problem is probably not unique and must have some though-out solution.
Firstly, I think you need to dispense with the idea of SQL Backups for this and shift to tables that record data changes.
Then you have a source database containing all your regular tables, plus another table that records a list of saved versions of it.
So table X might contain columns TestID, TestDesc, TestDesc2, etc
Then you might have a table that contains SavedDBID, SavedDBTitle,etc
Next, for each table X you have a table X_Changes. This has the same columns as table X, but also includes a SavedDBID column. This would be used to record any changed rows between the source database and the Saved one for a given SavedDBID.
When the user logs on, you create a clone of the source database. Then you use the Changes tables to make the clone's tables reflect the saved version. As the user updates the main tables in the clone, the changed rows should also be updated in the clone's Changes tables.
If the user decides to save their copy, use the Clone's changes tables to record the differences between the Source and the Clone in the original database, then discard the Clone.
I hope this is understandable. It will certainly make any schema changes easier to immediately reflect in the 'backups' as you'd only have one database schema to change. I think this is much more straightforward than using SQL Backups.
As for copying databases around using flash cards, you can give them a copy of the source database but only including info on the sessions they want.
As one possible solution - virtualise your SQL server. You can have multiple SQL servers if you want and you can clone and roll them back independently.
I'm using SQL Server 2008. My database is almost 2GB in size. 90% of it is one table (as per sp_spaceused), that I need don't for most of my work.
I was wondering if it was possible to take this table, and have it backed up in a separate file, allowing me to transfer the important data on a more frequent basis than this one.
My guess is the easiest way to do this is create a new database, create the table there, copy the table contents to the new database, drop the table relationships, drop the table, create a view pointing to the other database and use that view in my applications.
However, I was wondering if you had any pointers to different strategies that I may not be aware of at this point.
Create the table in a different FileGroup.
Here's a link with some good examples.
This creates a second physical file for just that table. It can be placed on a different physical drive for performance. You can do a backup or restore of just specific filegroups, which is what it sounds like you need.
This is one example of the larger topic of "Data Partitioning", which involves various methods of dividing large tables across multiple files.
I suggest the filegroup solution. However to copy a table from a database to another you can do this trick:
SELECT * INTO MyNewDatabase..MyTable FROM MyOldDatabase..MyTable
We have a "master database structure", and need a routine to keep the database structure on client sites up-to-date.
A number of suggestions have been given to a related question, but I am looking for a more specific solution, along these lines:
I would like to generate a text file (XML or other readable format) which describes the entire database structure (this could go into version control). This routine will run in-house, to provide a database schema file to be distributed with the next version of our product.
Then I need a way to update the database structure on the client site so that it corresponds to the master database structure. (In other words, I don't want to have to keep track of numerous change scripts for different versions of the database structure, but a more general routine which can get the client database structure updated to the current master database structure.)
So the main feature I'm looking for could be described as "database structure to text" and "text to database structure".
There are a whole lot of diff tools that can give you schema and stored procedures and constraint differences between two databases. You could roll your own, but I think it would be more expensive than one of these tools if you have a complex schema, many give a free trial so you can test.
The problem is you'd have to have the master database online to do so though and accessible from the client database installation (or install it there) which might or might not be feasible.
If you won't do that, the only other sane option I can think of is to use the migration idea, keep a list of SQL scripts + version pairs, plus current version on each database. This could be consolidated by a different tool that could generate a single script from a the client's database version number and the list of changes. And if you haven't the list of changes, you can start with a diff tool run, and keep track of them from there.
The comparing text route (comparing text SQL dumps of both schemas) you seem to prefer looks very hard to do it right and automatically to me, doesn't look like the right path to take.
Several popular strategies are variants of this:
Add a table to the database:
CREATE TABLE Release
(release_number int not null,
applied datetime not null
)
Each release, as part of its release script inserts a row into this table.
You can now find out with a single query which release each client is running, and run all the releases between that one and the release they want to be running.
In addition, you could check that their schema is correct for each version (correct table names, columns, etc.) by doing something like this:
SELECT so.name,
sc.name
FROM sysobjects so,
syscolumns sc
WHERE type = 'U'
ORDER BY 1, 2
then calculate a hash of the result and compare it with a pre-computed hash (generated by running the query on your reference installation) to see if the installation is now correct.