I'm creating an application that allows me to create scheduled tasks. I have a threaded process that runs in the background every minute, and triggers the following class:
Namespace MyName.space
Public Class RunJob
...
End Class
End Namespace
A job can either be "Run Once" or "Recurring".
I am querying the job table in vb.net, storing the results in a dataset so that I can iterate over them one at a time. I then started thinking about functions to validate if a job should run or not based on its unique criteria.
For example, the simplest of all would be the "Run Once" jobs:
If Type = "Run Once"
JobShouldRun = Helpers.Validate_RunOnce(WhenToRun, RunCount)
ElseIf Type = "Recurring"
...
End If
The function would check if the "WhenToRun" is within 5 minutes of the current date and time configured on the server. I chose a 5 minute window in the event of a task failure so that a minute later it'll try again and still have 3-4 tries left.
But then I started thinking that I could control more of this within the SQL Query to get the jobs themselves, and skip validating this "easy" on within vb.net, but I'm unsure of which method would be more optimal. So I could limit the result set to begin with in my initial query:
SELECT
JobId
,JobType -- Run Once or Recurring
,WhenToRun
FROM JobTable a
WHERE Active = 1
AND (
JobType = 'Run Once'
AND TotalRuns = 0
AND DATEDIFF(minute, getdate(), a.WhenToRun) BETWEEN 1 AND 5
)
OR JobType = 'Recurring'
Then I started thinking, should I perform all my logic with either UNION or JOINS, or a complex WHERE/AND/OR conditioning on the recurring jobs as well? They get quite a bit more complicated... Such as weekly and what days of the week, and what time, etc. Or every "int" years in "Jan|March|Etc" on the Last day of the month.
So I started thinking along these lines:
SELECT
JobID
,JobType
...
FROM JobTable
WHERE RecurringType = 'Weekly'
AND ... more conditions based on all the custom job settings
UNION ALL
SELECT
JobID
,...
FROM JobTable
WHERE RecurringType = 'Monthly'
AND ... more conditions based on all the custom job settings
This query would end up quite large and complex, but my question is, should I handle this in vb.net or SQL, or the easy stuff in SQL and the more complicated conditions in vb.net? I'm unsure of the performance impact of either direction.
You're basically asking "Where do I put code for business logic? In the app? Or in the database?" It can be a hotly debated topic, but it's sensible to consider both options. There are some tradeoffs with each approach, and it seems kinda cavalier to say one way is right and the other is wrong.
If your BL code is in the app, you get the primary benefit of the robustness of the .Net Framework. I love tsql, but you just can't do as much with it as you can VB.Net. If I can stereotype developers, I suspect in general they'll be more comfortable with .Net code. For most, it's probably easier to debug than tsql. Since the .Net code is compiled to an assembly, it's not likely to be altered either.
If your BL code is in tsql, you may find that performance is a bit better. You also abstract away some of the complexity from the .Net code and make that code base a bit smaller (some might argue this is bad thing). If there are bugs that need to be fixed, it's usually easier to redeploy a stored procedure (or user-defined function, view, etc) than to redeploy an application (especially if it involves multiple workstations). On the downside, it's easy for other to see (steal!) your code, or make changes to it.
As a general guideline, the more complex the business logic is, the more likely I'd be to put it in the app. If it's pretty simple and not likely to change, I'll consider putting it in tsql. That being said, if I had to pick one or the other with no exceptions, I'd put the BL in the app.
Related
Im new to salesforce, I have an Object Training__c and field End_Date, and when the End_Date came I need to create a task, but I dont know how to track this End_Date, because it is not a trigger...
Thanks
Look into time-based workflows (bit old school, we're encouraged to use flows now so check scheduled flows out.
It could be a whole scheduled flow (kind of like a nightly batch job) or a scheduled path in your "normal" flow (if you already have one on this object). There are some trailhead modules to get you started:
https://trailhead.salesforce.com/content/learn/modules/record-triggered-flows/get-started-with-triggered-flows
https://trailhead.salesforce.com/content/learn/modules/record-triggered-flows/add-a-scheduled-task-to-your-flow
Roughly speaking you'd set the action to fire "0 days after end date" and it becomes Salesforce's problem to modify the job if the end date changes. It's elegant, code free, fairly easy.
There are some problems with it such as scale, will there be tens of thousands of records? Another thing is this will work only for records created / modified since you activated this flow. What about all old data? What if I need to modify the flow's definition, will it de-queue all actions? (that one was a legit concern with time-based workflows. to edit the workflow you had to deactivate it - but doing so nuked all submitted actions).
So... you may decide to write some code for this after all. Have an apex batch job running nightly, selecting records with End Date <= TODAY that don't have (open)? tasks yet - and adding these. (maybe if a task is completed you'd want to give another one). Different solution, requiring you to write an unit test for it too (which isn't neccessarily a bad thing), bit more resilient than flows.
This looks like fairly similar problem solved with a batch: https://salesforce.stackexchange.com/q/118214/799
I'm having a problem on a batch job that has a simple SOQL query that returns a lot of records. More than a million.
The query, as it is, cannot be optimized much further according to SOQL best practices. (At least, as far as I know. I'm not an SF SOQL expert.)
The problem is that I'm getting -
Caused by: javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: Read timed out
I try bumping up the Jersey readtime out value from 30 seconds to 60 seconds, but it still times out.
Any recommendation on how to deal with this issue? Any recommended value for the readtimeout parameter for a query that returns that much data?
The query is like this:
SELECT Id, field1, field2__c, field3__c, field3__c FROM Object__c
WHERE field2__c = true AND (not field3 like '\u0025Some string\u0025')
ORDER BY field4__c ASC
In no specific order...
Batches written in Apex time out after 2 minutes so maybe set same in your Java application
Run your query in Developer Console using the query plan feature (you probably will have to put real % in there, not \u0025). Pay attention which part has "Cost" column > 1.
what are field types? Plain checkbox and text or some complex formulas?
Is that text static or changes depending on what your app needs? would you consider filtering out the string in your code rather than SOQL? Counter-intuitive to return more records than you really need but well, might be an option.
would you consider making a formula field with either whole logic or just the string search and then asking SF to index the formula. Or maybe making another field (another checkbox?) with "yes, it contains that text" info, set the value by workflow maybe (essentially prepare your data a bit to efficiently query it later)
read up about skinny tables and see if it's something that could work for you (needs SF support)
can you make an analytic snapshot of your data (make a report, make SF save results to helper object, query that object)? Even if it'd just contain lookups to your original source so you'll access always fresh values it could help. Might be a storage killer though
have you considered "big objects" and async soql
I'm not proud of it but in the past I had some success badgering the SF database. Not via API but if I had a nightly batch job that was timing out I kept resubmitting it and eventually 3rd-5th time it managed to start. Something in the query optimizer, creation of cursor in underlying Oracle database, caching partial results... I don't know.
what's in the ORDER BY? Some date field? If you need records updated since X first then maybe replication API could help getting ids first.
does it make sense to use LIMIT 200 for example? Which API you're using, SOAP or REST? Might be that returning smaller chunks (SOAP: batch size, REST API: special header) would help it finish faster.
when all else fails (but do contact SF support, make sure you exhausted the options) maybe restructure the whole thing. Make SF push data to you whenever it changes, not pull. There's "Streaming API" (CometD implementation, Bayeux protocol, however these are called) and "Change Data Capture" and "Platform Events" for nice event bus-driven architecture decisions, replaying old events up to 3 days back if the client was down and couldn't listen... But that's a totally different topic.
Suppose I have a stored procedure as follows:
create procedure p_x
as
begin
select 'a','b','c'
select 'c','d','e'
select 'e','f','g'
end
go
This is of course not the real code, but it illustrates enough to be able to ask my questions.
I'm looking for the best performance and the best practices to deal with it.
How will the client tool (eg Informatica Data Quality) calling this procedure react?
Will it receive 3 separate results, just the last query result or all results at once?
Will each separate query be send to the client directly (and will the procedure halt till completed)? or is this done after the procedure finished?
Is it good practice to work this way? I was looking for the exchange of an OUTPUT table type parameter, but this doesn't seem possible if I'm correct (based on other stories)(just as input)
Is there a performance impact in this way? And if so what is the way to do this as efficient as possible (e.g. to just send one result back to the client)
You would be better served by posting your question to the Informatica forums. They should be able to answer your questions precisely and accurately. But I'll give it a go.
How will the tool react? Don't know, but often tools that support using stored procedures as a datasource will assume and will consume a single (and the first) resultset. Any others will be ignored. Go ask in their forums.
Will it receive 3 ...? Roughly the same question and answer as the first.
Will each separate query ...? Your procedure produces three resultsets. How the client consumes them is, again, something you should ask in their forums. The procedure itself will not "halt" waiting for the client to do anything.
Is it good practice...? Not in my opinion. Nor is posting a complete nonsense procedure a useful tool for discussing the pros/cons of this approach. Can it be a useful thing to do? Likely. But it is not often done IME. In addition, you are dealing with a tool with which you are not familiar. The simpler you keep things the better you are off in the long run regardless of your tools.
A procedure is a unit of work and should do one "thing". If it produces multiple resultsets, one can argue that it ceases to do a single thing since, logically, each resultset represents a set of different (even if related) things. And typically one would expect to see some relationship among the resultsets. If there are no relationships, then the resultsets are obviously different things which violates the idea of a procedure. You might want to review the topic of coupling and cohesion. But I think I see a bigger issue - which I'll address with the next item.
Is there a performance impact ...? This can't really be answered. Performance is always, ALWAYS specific to a particular situation (query, schema, etc). Based on that last sentence, I think you have not made the adjustment to thinking in terms of sets - something that is critical to writing efficient sql. Rather, I'll guess that you are thinking in terms of a loop which includes a select statement and each iteration will produce a set of (perhaps 1 but who knows) rows. If you think you have the "option" to produce just one resultset of 3 rows vs. 3 resultsets of 1 row, then you are most likely stuck in RBAR land. Regardless, this can't really be answered. It is also a question for the Informatica people.
I have a production SQL-Server DB (reporting) that has many Stored Procedures.
The SPs are publicly exposed to the external world in different ways
- some users have access directly to the SP,
- some are exposed via a WebService
- while others are encapsulated as interfaces thru a DCOM layer.
The user base is large and we do not know exactly which user-set uses which method of accessing the DB.
We get frequent (about 1 every other month) requests from user-sets for modifying an existing SP by adding one column to the output or a group of columns to the existing output, all else remaining same.
We initially started doing this by modifying the existing SP and adding the newly requested columns to the end of the output. But this broke the custom tools built by some other user bases as their tool had the number of columns hardcoded, so adding a column meant they had to modify their tool as well.
Also for some columns complex logic is required to get that column into the report which meant the SP performance degraded, affecting all users - even those who did not need the new column.
We are thinking of various ways to fix this:
1 Default Parameters to control flow
Update the existing SP and control the new functionality by adding a flag as a default parameter to control the code path. By using default parameters, if value of the Parameter is set to true then only call the new functionality. By default it is set to False.
Advantage
New Object is not required.
On going maintenance is not affected.
Testing overhead remains under control.
Disadvantage
Since an existing SP is modified, it will need testing of existing functionality as well as new functionality.
Since we have no inkling on how the client tools are calling the SPs we can never be sure that we have not broken anything.
It will be difficult to handle if same report gets modified again with more requests – will mean more flags and code will become un-readable.
2 New Stored procedure
A new stored procedure will be created for any requirement which changes the signature(Input/Output) of the SP. The new SP will call the original stored procedure for existing stuff and add the logic for new requirement on top of it.
Advantage
Here benefit will be that there will be No impact on the existing procedure hence No Testing required for old logic.
Disadvantage
Need to create new objects in database whenever changes are requested. This will be overhead in database maintenance.
Will the execution plan change based on adding a new parameter? If yes then this could adversely affect users who did not request the new column.
Considering a SP is a public interface to the DB and interfaces should be immutable should we go for option 2?
What is the best practice or does it depend on a case by case basis, and what should be the main driving factors when choosing a option?
Thanks in advance!
Quoting from a disadvantage for your first option:
It will be difficult to handle if same report gets modified again with more requests – will mean more flags and code will become un-readable.
Personally I feel this is the biggest reason not to modify an existing stored procedure to accommodate the new columns.
When bugs come up with a stored procedure that has several branches, it can become very difficult to debug. Also as you hinted at, the execution plan can change with branching/if statements. (sql using different execution plans when running a query and when running that query inside a stored procedure?)
This is very similar to object oriented coding and your instinct is correct that it's best to extend existing objects instead of modify them.
I would go for approach #2. You will have more objects, but at least when an issue comes up, you will be able to know the affected stored procedure has limited scope/impact.
Over time I've learned to grow objects/data structures horizontally, not vertically. In other words, just make something new, don't keep making existing things bigger and bigger and bigger.
Ok. #2. Definitely. No doubt.
#1 says: "change the existing procedure", causing things to break. No way that's a good thing! Your customers will hate you. Your code just gets more complex meaning it is harder and harder to avoid breaking things leading to more hatred. It will go horribly slowly, and be impossible to tune. And so on.
For #2 you have a stable interface. No hatred. Yay! Seriously, "yay" as in "I still have a job!" as opposed to "boo, I got fired for annoying the hell out of my customers". Seriously. Never ever do #1 for that reason alone. You know this is true. You know it!
Having said that, record what people are doing. Take a user-id as a parameter. Log it. Know your users. Find the ones using old crappy code and ask them nicely to upgrade if necessary.
Your reason given to avoid number 2 is proliferation. But that is only a problem if you don't test stuff. If you do test stuff properly, then proliferation is happening anyway, in your tests. And you can always tune things in #2 if you have to, or at least isolate performance problems.
If the fatter procedure is really great, then retrofit the skinny version with a slimmer version of the fat one. In SQL this is tricky, but copy/paste and cut down your select column list works. Generally I just don't bother to do this. Life is too short. Having really good test code is a much better investment of time, and data schema tend to rarely change in ways that break existing queries.
Okay. Rant over. Serious message. Do #2, or at the very least do NOT do #1 or you will get yourself fired, or hated, or both. I can't think of a better reason than that.
Easier to go with #2. Nullable SP parameters can create some very difficult to locate bugs. Although, I do employ them from time to time.
Especially when you start getting into joins on nulls and ANSI settings. The way you write the query will change the results dramatically. KISS. (Keep things simple stupid).
Also, if it's a parameterized search for reporting or displaying, I might consider a super-fast fetch of data into a LINQ-able object. Then you can search an in-memory list rather than re-fetching from the database.
#2 could be better option than #1 particularly considering the bullet 3 of disadvantages of #1 since requirements keep changing on most of the time. I feel this because disadvantages are dominating here than advantages on either side.
I would also vote for #2. I've seen a few stored procedures which take #1 to the extreme: The SPs has a parameter #Option and a few parameters #param1, #param2, .... The net effect is that this is a single stored procedure that tries to play the role of many stored procedures.
The main disadvantage to #2 is that there are more stored procedures. It may be more difficult to find the one you're looking for, but I think that is a small price to pay for the other advantages you get.
I want to make sure also, that you don't just copy and paste the original stored procedure and add some columns. I've also seen too many of those. If you are only adding a few columns, you can call the original stored procedure and join in the new columns. This will definitely incur a performance penalty if those columns were readily available before, but you won't have to change your original stored procedure (refactoring to allow for good performance and no duplication of the code), nor will you have to maintain two copies of the code (copy and paste for performance).
I am going to suggest a couple of other options based on the options you gave.
Alternative option #1: Add another variable, but instead of making it a default variable base the variable off of customer name. That way Customer A can get his specialized report and Customer B can get his slightly different customized report. This adds a ton of work as updates to the 'Main' portion would have to get copied to all the specialty customer ones.
You could do this with branching 'if' statements.
Alternative option #2: Add new stored procedures, just add the customer's name to the stored procedure. Maintenance wise, this might be a little more difficult but it will achieve the same end results, each customer gets his own report type.
Option #2 is the one to choose.
You yourself mentioned (dis)advantages.
While you consider adding new objects to db based on requirement changes, add only necessary objects that don't make your new SP bigger and difficult to maintain.
I'd like suggestions for the design of a CRUD business app using Silverlight 4, the Business Application Template, WCF RIA Services and the Entity Framework 4. The app tracks lab test results performed on material samples. It replaces a (difficult to maintain) existing web application. Lab tests results are stored in two "SampleData" tables made up of hundreds of fields. The tables have a one to one relationship. I combined the two tables into one using Entity Framework's Table Per Type Inheritance which I'm very happy with. Note: I've decided not to change the database design to avoid destroying the existing application, but it was considered.
My dilemma is how to break up this huge table. Each record represents a material sample that is tested. The logical grouping of fields is by lab test. I envision my UI having multiple tabs or separate pages - one for each test. The problem at this point is that I'm sucking in ALL the fields yet only displaying a few in a paged DataGrid and there is a noticeable delay. Instead of one giant entity it might be nice to have several "Lab Test" entities (each representing a type of test) that are sub-sets of my one giant TPT Inheritance table. How would I do this? The base SampleData table/entity contains header fields plus several child test results fields. The second derived table/entity contains more test result fields linked to the base by SampleID. If split up I'd need to maintain the header info with each Lab Test entity.
I'm willing to stick with one giant table/entity (despite a slight performance penalty). Still, I'm wondering the best way to create my UI with this one entity. Can a DataForm be tabbed? If I make a dashboard with links to lab tests how do I keep header info in sync with each test page?
I know this is a broad question. I'm hoping to get suggestions on a good design path that will allow me to grow the app as new lab tests are added (making an even bigger entity). I'd hope to find a path that simplifies maintenance and takes advantage of the RAD experience Microsoft is promoting.
Thanks in advance!
I scanned the post discussing the database design and must say that based on what you said and the fact that you've already got users asking for more tests (repeating values) that I wish you'd reconsider the db redesign. You can create a flat view to simulate the existing flat samples-data table and use that to minimize breakage in the existing application.
But you've already made that decision, so how about reversing the situation? Instead of fixing the database, add code to the domain service that transforms the data from it's flat layout, leaving out all the null values.
One idea is to write a view that un-flattens the data and leaving out the null no-test situations. The query will raise eyebrows (I'll probably get flamed for this) because it looks nasty but in reality the DBMS does a fine job optimizing and performing the query (in Oracle anyway). I've had great results making a view something like::
create view programmer_exp_unflat as (
select programmer_id, 'C#', csharp_yrs from programmer_exp_flat where csharp_yrs is not null
union
select programmer_id, 'Java', java_yrs from programmer_exp_flat where java_yrs is not null
union
select programmer_id, 'Cobol', cobol_yrs from programmer_exp_flat where cobol_yrs is not null
.
repeat xx times) from dual
It's backwards and ugly no matter how you look at it but it reduces your result set to a bare minimum and no need to break things into categories. New test values require modification of the view, and depending on UI flexibility and business rules, might not require any changes.
It makes coding at the UI more difficult, as it would have been with the right database design in the first place, but your query result is reduced to only the tests that had been completed. If your users are flexible the UI could be designed to show the test results as a list making display a piece of cake. Your current design pretty much forces you to modify the UI and database with each and every new test.
These are the type challenges that make being a developer so much fun -- and why all the marketing gimmick sample CRUD applications that can be built in five minutes are worthless in the real world.
I'm answering (and accepting) my own question to increase my stack overflow accept rate, but my "answer" is that I have found no answer yet. Because I've had to move on with the project I continue to use one giant entity. I've also moved away from Silverlight and turned the project into a WPF app due to various struggles with Silverlight such as inherent asynchronous data access.