Lets say you have six oracle database servers which are basically identical, but represent different factories.
For easier reporting, we could make a nice big view on a seventh server that selects in via #dblink1-6. Works fine 99% of the time. Someone kicks the cord at plant 5, your view is dead for all plants. In this case, we want to just show the five that are working.
I cannot push the data from the six servers into the 7th, the 7th has to look out to 1-6. We can't use a materialized view because that's not live data in this case... often it could be, but not with linked servers where the outside server can't push data in.
What can I write into a view that basically says if this dblink works, union in a select statement otherwise don't?
As you've found, a view queries across a dblink will be marked invalid the first time it is accessed after a dblink is inaccessible.
My preferred solution would be to use materialized views so that the seventh server would always have access to at least some data - but in your case, you'd prefer to have no data rather than non-live data, so that's not an option.
In which case you need something to catch the "dblink inaccessible" exception and hide it from the view. The only way I can think of to solve that one is to query the tables using a pipelined function, which would swallow the exception and return zero rows if the dblink is down. Your original view would then UNION ALL the queries across six pipelined functions. Unfortunately I'm pretty sure this solution would perform very poorly in comparison with your original view, because it will not be able to do things like pushing predicates into the view (effectively the pipelined function will force the equivalent of a FTS across every dblink that's available, every time the query is run). Since your purpose is reporting, this may or may not be a big issue.
Note: I've never actually done this, so this answer is just "an idea to try".
Related
Greeting,
Recently I've started to work on an application, where 8 different modules are using the same table at some point in the workflow. This table have an Instead-Of trigger, which is 5,000 lines long (where first 500 and last 500 lines are common for all modules, and then each module has its own 500 lines of code).
Since the number of modules are going to grow and I want to keep thing as clear (and separate) as possible, I was wondering is there some sort of best practice to split trigger into stored procedures, or should I leave it all in one place?
P.S. Are there going to be any performance penalties for calling procedures from the trigger and passing 15+ parameters to them?
Bearing in mind that the inserted and deleted pseudo-tables are only accessible from within trigger code, and that they can contain multiple rows, you're facing two choices:
Process the rows in inserted and deleted in a RBAR1 fashion, to be able to pass scalar parameters to the stored procedures, or,
Copy all of the data from inserted and deleted into table variables that are then passed to the procedures as appropriate.
I'd expect either approach to impose some2 performance overhead, just from the copying
That being said, it sounds like too much is happening inside the triggers themselves - does all of this code have to be part of the same transaction that performed the DML statement? If not, consider using some form of queue (a table of requests or Service Broker, say) in which to place information on work to perform, and then process the data later - if you use Service Broker, you could have it inspect a shared message and then send appropriate messages to dedicated endpoints for each of your modules, as appropriate.
1 Row By Agonizing Row - using either a cursor of something else to simulate one to access each row in turn - usually frowned upon in a Set-based language like SQL.
2 How much is impossible to know without getting into the specifics of your code and probably trying all possible approaches and measuring the result.
I don't think there is a meaningful performance penalty in this case.
Any way, it is bad practice to write it all inside the trigger (when it is 5000 lines long...).
I think the main consideration is maintainability, which will be much better if you split it
To several SPs
Here's the issue:
The database is highly normalized, and one particular query relies on the multiple relationships in the database. The query is designed to join all the tables, construct the entire object, and then return a list of those objects.
In other words this particular query does a lot of work.
Now, the query does only return X number of items as it supports pagination, but we also need to know the total count of items that are there.
Currently these two tasks are independent, but highly similar queries in our Domain Service. Ideally what I'd like to do is combine these two queries so that the call to the server happens once, rather than twice, and that the joins happen only once.
Output/Reference parameters don't work, and since the function is designed to return an IQueryable of items, I'm stuck on how to return this list of items as well as the total count.
I'm sure someone's come across this before - any thoughts?
A count of item joined tables is not the same thing as returning a subset of those records. They just happen to share a certain amount of SQL code (specifically to join the tables). RIA does the actual paging server-side so you are actually getting a slightly different query for every paging call.
A count operation would also operate much faster than the record query as SQL counts can often be performed using database indexes only (although Linq may well optimise this for you to the same end result... Clever Linq coders!).
As you would only be requesting the total count once (on page load I assume), then you begin paging through multiple queries on different portions of the data, you are hitting different parts of the database with every call.
You are better off treating them as two distinct functions (as you were) and wear the slight overhead of an additional server call. There is always somewhere else you could make bigger gains (caching etc).
When in doubt: Do not overcomplicate any process for the sake of only a very small gain.
If the problem is with the client server communication, you can put the count result on the header of the result response.
I have an interesting delimma. I have a very expensive query that involves doing several full table scans and expensive joins, as well as calling out to a scalar UDF that calculates some geospatial data.
The end result is a resultset that contains data that is presented to the user. However, I can't return everything I want to show the user in one call, because I subdivide the original resultset into pages and just return a specified page, and I also need to take the original entire dataset, and apply group by's and joins etc to calculate related aggregate data.
Long story short, in order to bind all of the data I need to the UI, this expensive query needs to be called about 5-6 times.
So, I started thinking about how I could calculate this expensive query once, and then each subsequent call could somehow pull against a cached result set.
I hit upon the idea of abstracting the query into a stored procedure that would take in a CacheID (Guid) as a nullable parameter.
This sproc would insert the resultset into a cache table using the cacheID to uniquely identify this specific resultset.
This allows sprocs that need to work on this resultset to pass in a cacheID from a previous query and it is a simple SELECT statement to retrieve the data (with a single WHERE clause on the cacheID).
Then, using a periodic SQL job, flush out the cache table.
This works great, and really speeds things up on zero load testing. However, I am concerned that this technique may cause an issue under load with massive amounts of reads and writes against the cache table.
So, long story short, am I crazy? Or is this a good idea.
Obviously I need to be worried about lock contention, and index fragmentation, but anything else to be concerned about?
I have done that before, especially when I did not have the luxury to edit the application. I think its a valid approach sometimes, but in general having a cache/distributed cache in the application is preferred, cause it better reduces the load on the DB and scales better.
The tricky thing with the naive "just do it in the application" solution, is that many time you have multiple applications interacting with the DB which can put you in a bind if you have no application messaging bus (or something like memcached), cause it can be expensive to have one cache per application.
Obviously, for your problem the ideal solution is to be able to do the paging in a cheaper manner, and not need to churn through ALL the data just to get page N. But sometimes its not possible. Keep in mind that streaming data out of the db can be cheaper than streaming data out of the db back into the same db. You could introduce a new service that is responsible for executing these long queries and then have your main application talk to the db via the service.
Your tempdb could balloon like crazy under load, so I would watch that. It might be easier to put the expensive joins in a view and index the view than trying to cache the table for every user.
All,
Looking for some guidance on an Oracle design decision I am currently trying to evaluate:
The problem
I have data in three separate schemas on the same oracle db server. I am looking to build an application that will show data from all three schemas, however the data that is shown will be based on real time sorting and prioritisation rules that is applied to the data globally (i.e.: based on the priority weightings applied I may pull back data from any one of the three schemas).
Tentative Solution
Create a VIEW in the DB which maintains logical links to the relevant columns in the three schemas, write a stored procedure which accepts parameterised priority weightings. The application subsequently calls the stored procedure to select the ‘prioritised’ row from the view and then queries the associated schema directly for additional data based on the row returned.
I have concerns over performance where the data is being sorted/ prioritised upon each query being performed but cannot see a way around this as the prioritisation rules will change often. We are talking of data sets in the region of 2-3 million rows per schema.
Does anyone have alternative suggestions on how to provide an aggregated and sorted view over the data?
Querying from multiple schemas (or even multiple databases) is not really a big deal, even inside the same query. Just prepend the table name with the schema you are interested in, as in
SELECT SOMETHING
FROM
SCHEMA1.SOME_TABLE ST1, SCHEMA2.SOME_TABLE ST2
WHERE ST1.PK_FIELD = ST2.PK_FIELD
If performance becomes a problem, then that is a big topic... optimal query plans, indexes, and your method of database connection can all come into play. One thing that comes to mind is that if it does not have to be realtime, then you could use materialized views (aka "snapshots") to cache the data in a single place. Then you could query that with reasonable performance.
Just set the snapshots to refresh at an interval appropriate to your needs.
It doesn't matter that the data is from 3 schemas, really. What's important to know is how frequently the data will change, how often the criteria will change, and how frequently it will be queried.
If there is a finite set of criteria (that is, the data will be viewed in a limited number of ways) which only change every few days and it will be queried like crazy, you should probably look at materialized views.
If the criteria is nearly infinite, then there's no point making materialized views since they won't likely be reused. The same holds true if the criteria itself changes extremely frequently, the data in a materialized view wouldn't help in this case either.
The other question that's unanswered is how often the source data is updated, and how important is it to have the newest information. Frequently updated source day can either mean a materialized view will get "stale" for some duration or you may be spending a lot of time refreshing the materialized views unnecessarily to keep the data "fresh".
Honestly, 2-3 million records isn't a lot for Oracle anymore, given sufficient hardware. I would probably benchmark simple dynamic queries first before attempting fancy (materialized) view.
As others have said, querying a couple of million rows in Oracle is not really a problem, but then that depends on how often you are doing it - every tenth of a second may cause some load on the db server!
Without more details of your business requirements and a good model of your data its always difficult to provide good performance ideas. It usually comes down to coming up with a theory, then trying it against your database and accessing if it is "fast enough".
It may also be worth you taking a step back and asking yourself how accurate the results need to be. Does the business really need exact values for this query or are good estimates acceptable
Tom Kyte (of Ask Tom fame) always has some interesting ideas (and actual facts) in these areas. This article describes generating a proper dynamic search query - but Tom points out that when you query Google it never tries to get the exact number of hits for a query - it gives you a guess. If you can apply a good estimate then you can really improve query performance times
I am writting an application which needs to periodically (each week for example) loop through several million records ina database and execute code on the results of each row.
Since the table is so big, I suspect that when I call SomeObject.FindAll() it is reading all 1.4million rows and trying to return all the rows in a SomeObject[].
Is there a way I can execute a SomeObject.FindAll() expression, but load the values in a more DBMS friendly way?
Not with FindAll() - which, as you've surmised, will try to load all the instances of the specified type at one time (and, depending on how you've got NHibernate set up may issue a stupendous number of SQL queries to do it).
Lazy loading works only on properties of objects, so for example if you had a persisted type SomeObjectContainer which had as a property a list of SomeObject mapped in such a way that it should match all SomeObjects and with lazy="true", then did a foreach on that list property, you'd get what you want, sort-of; by default, NHibernate would issue a query for each element in the list, loading only one at a time. Of course, the read cache would grow ginormous, so you'd probably need to flush a lot.
What you can do is issue an HQL (or even embedded SQL) query to retrieve all the IDs for all SomeObjects and then loop through the IDs one at a time fetching the relevant object with FindByPrimaryKey. Again, it's not particularly elegant.
To be honest, in a situation like that I'd probably turn this into a scheduled maintenance job in a stored proc - unless you really have to run code on the object rather than manipulate the data somehow. It might annoy object purists, but sometimes a stored proc is the right way to go, especially in this kind of batch job scenario.