At present we have very little referential integrity, as well as having a number of tables that self-join (and indeed would perhaps better be represented as separate tables or views that joined).
The knowledge of how these tables relate to each other is implicit in the logic of the stored procedures rather than explicit in the schema. We are considering changing this.
The first step is to actually understand the implicit relationships and document them.
So my question is...
What is the best way to extract that implicit information, short of eyeballing every stored procedure. I will consider any tools, writing my own SQL to interrogate the system tables, or utilising the SQL-DMO model - or in fact anything under the sun that lets the computer do more work and me do less.
If the relationships are only identified by joins in the SPs, then you're not going to have a lot of luck automating it.
It might be worthwhile capturing queries using the profiler to find the most frequent joins first.
When it comes to refactoring, I am the old-school:
Document what you have, use visual tool.
Describe -- in writing -- the business model that this database captures.
Pick-out entities out of the description nouns and the existing schema you have.
Create a new ER model; consult with business while at it.
Create a new DB based on the ER
ETL data over to the new db and test.
You can use sys.sql_dependencies to find out what columns and tables an SP depends on (helps if you don't do SELECT * in your SPs). This will help you get an inventory of candidates at least:
referenced_major_id == the OBJECT_ID of the table
referenced_minor_id == the column id: COLUMNPROPERTY(referenced_major_id,
COLUMN_NAME,
'ColumnId')
You might have to use sp_refreshsqlmodule to ensure that the dependencies are up to date for that to work. i.e. if you change a view, you need to sp_refreshsqlmodule on each non-schema-bound module (obviously schema-bound modules don't allow any underlying changes changes in the first place - but you will get an error if you call sp_refreshsqlmodule on a schema-bound object) which depended on that view. You can automate that by calling sp_refreshsqlmodule on these objects:
SELECT *
FROM INFORMATION_SCHEMA.ROUTINES
WHERE OBJECTPROPERTY(OBJECT_ID(QUOTENAME(ROUTINE_SCHEMA) + '.'
+ QUOTENAME(ROUTINE_NAME)),
N'IsSchemaBound') IS NULL
OR OBJECTPROPERTY(OBJECT_ID(QUOTENAME(ROUTINE_SCHEMA) + '.'
+ QUOTENAME(ROUTINE_NAME)),
N'IsSchemaBound') = 0
Related
Background
I have a multi-tenant scenario and a unique Sql Server project that will be deployed into multiple databases instances on the same server. There will be one db for each tenant, plus one "model" db.
The "model" database serves three purposes:
Force some "system" data to be always present in each tenant database
Serves as an access point for users with a special permission to edit system data (which will be punctually synced to all tenants)
When creating a new tenant, the database will be copied and attached with a new name representing the tenant
There are triggers that checks if the modified / deleted data within tenant db corresponds to "system" data inside the "model" db. If it does, an error is raised saying that system data cannot be altered.
Issue
So here's a part of the trigger that checks if deletion can be allowed:
IF DB_NAME() <> 'ModelTenant' AND EXISTS
(
SELECT
[deleted].*
FROM
[deleted]
INNER JOIN [---MODEL DB NAME??? ---].[MySchema].[MyTable] [ModelTable]
ON [deleted].[Guid] = [ModelTable].[Guid]
)
BEGIN;
THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1
END
I can't seem to find what should take the place of --- MODEL DB NAME??? --- in the above code that would allow the project to compile properly. When refering to a completely different project I know what to do: use a reference to the project that's represented with an SQLCMD variable. But in this scenario the project reference is essentially the same project; only on a different database. I can't seem to be able to add a self-reference in this manner.
What can I do? Does SSDT offers some kind of support for such a scenario?
Have you tried setting up a Database Variable? You can read under "Reference aware statements" here. You could then say:
SELECT * FROM [$(MyModelDb)][MySchema].[MyTable] [ModelTable]
If you don't have a specific project for $(MyModelDb) you can choose the option to "suppress errors by unresolved references...". It's been forever since I've used SSDT projects, but I think that should work.
TIP: If you need to reference 1-table 100-times, you may find it better to create a SYNONM that uses the database variable, then point to the SYNONM in your SPROCs/TRIGGERs. Why? Because that way you don't need to deploy your SPROCs/TRIGGERs to get the variable replaced with the actual value and that can make development smoother.
I'm not quite sure if SSDT is particularly well-suited to projects of any decent amount of complexity. I can think of one or two ways to most likely accomplish this (especially depending on exactly how you do the publishing / deployment), but I think you would actually lose more than you gain. What I mean by that is: you could add steps to get this to work (i.e. win the battle), but you would be creating a more complex system in order to get SSDT to publish a system that is more complex (and slower) than it needs to be (i.e. lose the war).
Before worrying about SSDT, let's look at why you need/want SSDT to do this in the first place. You have system data intermixed with tenant data, and you need to validate UPDATE and DELETE operations to ensure that the system data does not get modified, and the only way to identify data that is "system" data is by matching it to a home-of-record -- ModelDB -- based on GUID PKs.
That theory on identifying what data belongs to the "system" and not to a tenant is your main problem, not SSDT. You are definitely on the right track for a multi-tentant system by having the "model" database, but using it for data validation is a poor design choice: on top of the performance degradation already incurred from using GUIDs as PKs, you are further slowing down all of these UPDATE and DELETE operations by funneling them through a single point of contention since all client DBS need to check this common source.
You would be far better off to include a BIT field in each of these tables that mixes system and tenant data, denoting whether the row was "system" or not. Just look at the system catalog views within SQL Server:
sys.objects has an is_ms_shipped column
sys.assemblies went the other direction and has an is_user_defined column.
So, if you were to add an [IsSystemData] BIT NOT NULL column to these tables, your Trigger logic would become:
IF DB_NAME() <> N'ModelTenant' AND EXISTS
(
SELECT del.*
FROM [deleted] del
WHERE del.[IsSystemData] = 1
)
BEGIN
;THROW 50000, 'The DELETE operation on table MyTable cannot be performed. At least one targeted record is reserved by the system and cannot be removed.', 1;
END;
Benefits:
No more SSDT issue (at least for from this part ;-)
Faster UPDATE and DELETE operations
Less contention on the shared resource (i.e. ModelDB)
Less code complexity
As an alternative to referencing another database project, you can produce a dacpac, then reference the dacpac as a database reference in "same server, different database" mode.
I am working on Oracle10.2g database for a web project. I had exported full Schema objects of a database from a remote system in a file(some-file.dmp). Then I wanted to import the contents of file into another database on the local system. The process worked perfectly.
However, I accidently imported the file contents(including tables,views etc.) into SYS user. So, the SYS user is now overcrowded with around 1500 unwanted objects.
I know I can drop the objects individually, but, that's a tiresome effort. Now, I was wondering if there is any way that I can kind of undo the process and remove the unwanted objects(remove the tables,views etc. from SYS user that were mistakenly imported)?
EDIT :
The imported objects include in particular
Tables(Obviously including FK Constraints)
Views
Indexes
Packages
Procedures
Functions
Sequences
Triggers
Java Code
So, they are inter-related to one another. Any ideas or Advice greatly appreciated!
You can try querying DBA_OBJECTS and looking for any that are owned by SYS and recently created. For example, the following lists all objects that were created in the SYS schema today:
SELECT object_name, object_type
FROM dba_objects
WHERE owner = 'SYS'
AND created >= TRUNC(SYSDATE)
You can then use this to generate some dynamic SQL to drop the objects. That should save you dropping them manually.
Note however that there may be some objects that have been recently created and should be owned by SYS, so double-check what it is you're dropping before you drop it. (On my Oracle 11g XE database, the newest objects in the SYS schema were index and table partitions created about a week and a half ago.)
I don't know what types of objects you have, but there will be some dependencies between object types. In particular, you can't drop a table if another table has foreign-key constraints pointing to it. This answer provides some PL/SQL code to disable all constraints on a table, which you can adapt to drop all constraints, or just drop all foreign-key constraints, if you need to.
Also, if a table column uses a type, that table will need to be dropped before dropping the type. Similarly, you may have to take care if types have dependencies on other types.
Other things to be aware of:
You don't need to drop package bodies, just drop the packages and the bodies will go with them.
You don't need to drop triggers on tables and views: the triggers go when the table or view is dropped. Similarly, dropping a table drops all of the indexes on that table.
Views, procedures, functions and packages may depend on other objects but they shouldn't stop those other objects from being dropped. The views/procedures/functions/packages should become invalid, but if they're going to be dropped anyway that doesn't matter.
You don't specify what other types of object you have, so there may well be other issues you encounter.
EDIT: in response to your updated question:
You can drop the objects in the order you specify, once you've dropped the FK constraints. The tables will be the hardest part: once they're all gone everything else should be straightforward.
You don't need to drop indexes as they get dropped automatically when you drop the tables.
You don't need to drop triggers on tables or views as these get dropped automatically when you drop the view or table. (I don't know whether you have any other triggers such as AFTER LOGON ON DATABASE, but such triggers might not be included in exports anyway.)
I only have Oracle XE, which doesn't support Java, so I can't be sure of the exact incantation necessary to drop Java classes. The Oracle documentation for DROP JAVA may be of some help to you.
We are architecting a new database that will make heavy use of schemas to separate logical parts of our database.
And example could be employee and client. We will have a schema for each and the web services that connect to one will not be allowed in the other.
Where we are hitting problems/concerns is where the data appears very similar between the two schemas. For example both employees and clients have addresses.
We could do something like common.Address. But the mandate to keep the services data access separate is fairly strong.
So it is looking like we will go with employee.Address and client.Address.
However, it would be nice if there was a way to enforce a global Address table definition. Something to prevent the definition of these two Address tables from drifting apart during development. (Note: there will actually be more than two.)
Is there anything like that in SQL Server. Some kind of Table "type" or "class" that can be "instantiated" into different schemas. (I am not hopeful here, but I thought I would ask.)
Thoughts, rather then a hard answer...
We have
a common Data schema
Views, Procs etc in a schema per client
An internal "Helper" schema for shared code
Would this work for you?
My other thought is a database per client. It easier to permission per database, than per schema, especially for direct DB access, support or power user types
I think your best bet is a DDL trigger, where you can cause a failure when altering any of your "common" tables.
something like:
CREATE TRIGGER [Dont_Change_CommonTables]
ON DATABASE
FOR DDL_TABLE_EVENTS,GRANT_DATABASE
AS
DECLARE #EventData xml
DECLARE #Message varchar(1000)
SET #EventData=EVENTDATA()
IF (#EventData.value('(/EVENT_INSTANCE/ObjectType)[1]', 'varchar(50)')='TABLE'
AND #EventData.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(50)') IN ('Address'
,'etc...'
--place your table list here
)
)
BEGIN
ROLLBACK
SET #Message='Error! you can not make changes to '+ISNULL(LOWER(#EventData.value('(/EVENT_INSTANCE/ObjectType)[1]', 'varchar(50)')),'')+': '+ISNULL(#EventData.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(50)'),'')
RAISERROR(#Message,16,1)
RETURN
END
GO
Can someone explain to me what views or materialized views are in plain everyday English please? I've been reading about materialized views but I don't understand.
Sure.
A normal view is a query that defines a virtual table -- you don't actually have the data sitting in the table, you create it on the fly by executing.
A materialized view is a view where the query gets run and the data gets saved in an actual table.
The data in the materialized view gets refreshed when you tell it to.
A couple use cases:
We have multiple Oracle instances where we want to have the master data on one instance, and a reasonably current copy of the data on the other instances. We don't want to assume that the database links between them will always be up and operating. So we set up materialized views on the other instances, with queries like select a,b,c from mytable#master and tell them to refresh daily.
Materialized views are also useful in query rewrite. Let's say you have a fact table in a data warehouse with every book ever borrowed from a library, with dates and borrowers. And that staff regularly want to know how many times a book has been borrowed. Then build a materialized view as select book_id, book_name, count(*) as borrowings from book_trans group by book_id, book_name, set it for whatever update frequency you want -- usually the update frequency for the warehouse itself. Now if somebody runs a query like that for a particular book against the book_trans table, the query rewrite capability in Oracle will be smart enough to look at the materialized view rather than walking through the millions of rows in book_trans.
Usually, you're building materialized views for performance and stability reasons -- flaky networks, or doing long queries off hours.
A view is basically a "named" SQL statement. You can reference views in your queries much like a real table. When accessing the view, the query behind the view is executed.
For example:
create view my_counter_view(num_rows) as
select count(*)
from gazillion_row_table;
select num_rows from my_counter_view;
Views can be used for many purposes such as providing a simpler data model, implement security constraints, SQL query re-use, workaround for SQL short comings.
A materialized view is a view where the query has been executed and the results has been stored as a physical table. You can reference a materialized view in your code much like a real table. In fact, it is a real table that you can index, declare constraints etc.
When accessing a materialized view, you are accessing the pre-computed results. You are NOT executing the underlaying query. There are several strategies for how to keeping the materialized view up-to-date. You will find them all in the documentation.
Materialized views are rarely referenced directly in queries. The point is to let the optimizer use "Query Rewrite" mechanics to internally rewrite a query such as the COUNT(*) example above to a query on the precomputed table. This is extremely powerful as you don't need to change the original code.
There are many uses for materialied views, but they are mostly used for performance reasons. Other uses are: Replication, complicated constraint checking, workarounds for deficiencies in the optimizer.
Long version: -> Oracle documentation
A view is a query on one or more tables. A view can be used just like a table to select from or to join with other tables or views. A metrialized view is a view that has been fully evaluated and its rows have been stored in memory or on disk. Therefore each time you select from a materialized view, there is no need to perform the query that produces the view and the results are returned instantly.
For example, a view may be a query such as SELECT account, SUM(payment) FROM payments GROUP BY account with a large number of payments in the table but not many accounts. Each time this view is used the whole table must be read. With a materialized view, the result is returned instantly.
The non-trivial issue with materialized views is to update them when the underlying data is changed. In this example, each time a new row is added to the payments table, the row in the materialized view that represents the account needs to be updated. These updates may happen synchronously or periodically.
Yes. Materialized views are views with a base table underneath them. You define the view and Oracle creates the base table underneath it automatically.
By executing the view and placing the resulting data in the base table you gain performance.
They are useful for a variety of reasons. Some examples of why you would use a materialized view are:
1) A view that is complex may take a long time to execute when referenced
2) A view included in complex SQL may yield poor execution plans leading to performance issues
3) You might need to reference data across a slow DBLINK
A materialized view can be setup to refresh periodically.
You can specify a full or partial refresh.
Please see the Oracle documentation for complete information
Let's say I have DatabaseA with TableA, which has these fields: Id, Name.
In another database, DatabaseB, I have TableA which has these fields: DatabaseId, Id, Name.
Is it possible to setup a replication publication that will send:
DatabaseA.dbid, DatabaseA.TableA.Id, DatabaseA.TableA.Name
to DatabaseB.TableA?
Edit:
The reason I'm asking is that I need to combine multiple databases (with identical schemas) into a single database, with as little latency as possible. Replication seemed like a good place to start (need to replicate data from one place to another), but I'm just in the brainstorming phase. I would definitely be open to suggestions on how to accomplish this without using replication.
There might be an easier way to do it, but the first thing I thought of is wrapping TableA in an indexed view on the source database and then replicating the view as a table (i.e., type = "indexed view logbased"). I don't think this would work with merge replication, though.
So, that would roughly be like:
CREATE VIEW TableA_with_dbid WITH SCHEMABINDING AS
SELECT DatabaseA.dbid, Id, Name FROM TableA
CREATE UNIQUE CLUSTERED INDEX ON TableA_with_dbid (Id) -- or whatever your PK is
EXEC sp_addarticle ...,
#source_object = 'TableA_with_dbid',
#destination_table = 'TableA',
#type = 'indexed view logbased',
...
Big caveat: indexed views have a lot of requirements that may not be appropriate for your application. For example, certain options have to be set any time you update the base table.
(In response to the edit in your question...) This won't work for combining multiple sources into one table. AFAIK, an object in a subscribing database can only come from one published article. And you can't do an indexed view on the subscribing side since UNION is not allowed in an indexed view. (The docs don't explicitly state UNION ALL is disallowed, but it wouldn't surprise me. You might try it just in case.) But it still does answer your explicit question: the dbid would be in the replicated table.
Are you aggregating these events in one place from multiple sources? Replicating only comes from one source - it's one-to-one, so the source ID doesn't seem like it would make much sense.
If you're aggregating data from multiple sources, maybe linked servers and triggers is a better choice, and if that's the case, then you could absolutely include any information about the source that you want.
If you can clarify your question to describe the purpose, it would help us find the best solution.
UPDATED FROM NEW DETAIL IN QUESTION:
Does this solution sound like it might be what you need?
Set up AFTER triggers on the source databases that send any changed rows to the central repository database, in some kind of holding table. These rows can include additional columns, like "Source", "Change type" (for insert, delete, etc).
Some central process watches the table and processes new rows (or runs periodically - once/minute, maybe), incorporating them into the central database
You could adjust how frequently the check/merge process runs on the server based on your needs (even running it constantly to handle new rows as they appear, perhaps even with an AFTER trigger on that table as well).