Coldfusion Compare Two Query Results from Same Database - sql-server

I've done some research on this site for an issue I'm having, however, I'm finding that the solution is not exactly what I'm looking for, or the implementation doesn't relate to what I'm trying to do. Or, simply put, I just can't seem to figure it out. Here is my issue.
We have a monthly query that we would run that we would send to a third party of physicians, their degree, specialty and clinic. I have the query established already. But recently they wanted to just have us export new results from the previous months data, instead of the whole results list. So, I thought I would create a tool that I would start out simply importing the previous months data. And then taking the query I had been using, putting that in a coldfusion page, run it, and it would show me new records ran for the current month we're in, to the previous month. When I run the report of new data each month, it would save that data in the database with the columns r_month and r_year, which simply means report month/year. So to initially populate the database I just imported Octobers data so I can have a base with the r_month/year being "10" and "2014" respectively. There are 674 records. Then created my page and had a button that would run the same query, save those results, which the r_month and r_year is saved as "11" and "2014" respectively. When I do that, I have 682 records. So, for the month of November, there are 8 "different" or new records from the previous month (October). My question is, what is the best way to run a query that takes the data from October (10/2014) and compare it to the November's data (11/2014), and just give me the new 8 records that were new from November.
Sorry this is long, but wanted to give you guys a detail so you have as much information as possible. I don't really have a code sample I can provide, because apparently the way I was attempting before (using loops etc.) was just not working. Tried looping through previous month query and current month query, trying to find a difference, but that wasn't working. Once again, I've tried using similar samples I've found on here, but they are either not what I'm looking for, or I just can't figure them out. Basically at the end of the process, there needs to be a button that only exports the new records (in this example, the 8) into an excel sheet that we can simply email them.
Any help would be greatly appreciated.

SOLUTION 1 - Since you are using SQL server you can do this pretty easily within the query. You have already logged the previous data so you presumably have a key for the "old" physicians in your log table. Try something like this:
<cfquery name="getNewPHys" datasource="#dsn#">
SELECT *
FROM sourceTable
WHERE physID NOT IN
(SELECT physID FROM logtable
WHERE daterange between #somerange# AND #someotherrange#)
</cfquery>
You would have to add your own values and vars but you get the idea.
NOTE: This is psuedo-code. you would OF COURSE use cfqueryparam for any of your variables.
SOLUTION 2
Another way to do this is by using a dateadded or lastUPdated table. Every time a row is updated you update the lastupdated column with the current date/time. Then selecting recent records is a matter of selecting any records which have been updated within your range. Taht's what Leigh suggested in her comment.
I would add one other comment. You seem to be trying to solve this problem without changing anything in your data table. That's not going to work. You need to think about your schema a bit more. For exmaple, solution 2 would involve adding an additional column and you could even add a MSSQL trigger that automatically updated that field whenever the record was updated. Wouldn't that work?
I still think we are missing something. Are you perchance overwriting your data each time? Or producing duplicate records - 674 this month, 682 next month with duplicates? If so, that's what you need to correct. Anything else is going to be a bolt on solution that creates more problems down the road.

Step 1 - Add a computed column to your table. Make sure you persist the data so you can index it. The computation should result in values like '201401' for January 2014, etc. Let's call that column YearMonth
Then your code and query looks like this:
ControlYearMonth = "201410"; // October 2014
<cfquery>
select field1, field2, etc
from yourtable
where YearMonth = <cfqueryparam value="#ControlYearMonth#">
except
select field1, field2, etc
from yourtable
where YearMonth < <cfqueryparam value="#ControlYearMonth#">
<cfquery>

Related

How to delete only few values from a sql table output

If you notice, I am just keeping the first three blocks of a row same and deleting the others as they are same but not deleting the entire row as the balance qty keeps on changing. I want the first entire row to be retained, however succeeding rows should have only balance qty output for the part no , product descry and weight
Can someone please help me in suggesting a query in Microsoft SQL where I get an output shown in table 2 from table 1 output which I am getting through a query
This is really a presentation-level question. I'd never do it in SQL, not that it's impossible. Look up the LAG function. you can use that to look at the previous row's description and set it to blank if equal.

SQL/pivot Calculated fields, is this possible

I'm using Microsoft SQL Server and Excel. I have an issue I'm having problems getting my head round surrounding how to get some fields calculated so that it runs faster. I have a large data set that gets dropped into excel and stuck into a pivot table.
The table at it's simplest will contain a number of fields similar to the below.
Date user WorkType Count TotalTime
My issue is that I need to calculate an average in a particular way. Each user may have several worktypes on any given day. The formula I have is for each Date&User Sum(TotalTime)/Sum(Count) to get me the following
Date user Average
Currently I dump a select query into excel, apply formula to a column to get my averages then construct the pivot table using the personal details and the averages.
The calculation on over 20,000 rows however is about 5-7 minutes.
So my question is that possible to do that type of calculation in either SQL or Pivot table to cut down the processing time. I'm not very confident with Pivot tables, and I'm considered fairly inexperienced at SQL compared to here. I can manage bits of this but pulling it all together with the conditions of matching Date and User is beyond me right now.
I could parse the recordset into an array to do my calculations that way before it gets written to the spreadsheet, but I just feel that there should be a better way to achieve the same end.
Pre-calculated aggregates in SQL can go very wrong in an Excel Pivot.
Firstly, you can accidentally take an average of an average.
Secondly, once users start re-arranging the pivot you can get very strange sub-totals and totals.
Try to ensure you do all of your aggregation in one place.
If possible try to use SQL with SSRS, you can base a report on a parameterised stored procedure. Consequently you push all of the hard work onto the SQL box, and you restrict users from pivoting things around improperly.
SELECT Raw_Data.ID, Raw_Data.fldDate, CONVERT(datetime, DATEADD(second, SUM(DATEDIFF(second, 0, Raw_Data.TotalHandle)) / SUM(Raw_Data.[Call Count]), 0), 108) AS avgHandled
FROM Raw_Data
WHERE (Raw_Data.fldDate BETWEEN '05-12-2016' AND '07-12-2016')
GROUP BY Raw_Data.fldDate, Raw_Data.ID
For anyone interested here is the results of my searching. Thank you for the help that pointed me in the right direction. It seems quite clumsy with the conversions due to a time Datatype but it's working.

Looking for a suggestion on how to go about looping through rows of a table and changing a value based on other values with the same ID

I am looking for some ideas on the best way to go about doing this.
I would like to use something like a for each loop, but I know that can be difficult and not good practice in SQL. I have a table full of 'comments' of which each have a unique commentID. It is associated with a table of 'Deals' by the DealID. Each comment has a DealID associated with it, and since multiple comments can be made on a single deal, several comments may have the same DealID associated with them.
I have a CurrentComment attribute in my comments table which is either 0 or 1 (1 being the most recent comment) because of some issues in our DB, I had to reset every Comment to have a 0 for the 'current comment' value.
What I want to do is go through the entire table of comments, and for each unique DealID, set the most recently made comment (associated with that DealID) to have a value of 1 for the current comment.
I'm thinking I would want to look at all of the comments associated with a single DealID, and the largest CommentID value would be the most recently made comment, so I would change that CurrentComment Value.
Any input/suggestions on how to go about something like this is much appreciated!
You do not need a for loop to do this. Sql Server is set up to do the exact thing you are asking about, you just have to think about the problem a little differently. Sql Server brings back all of the rows you will need to update, assuming your criteria is set correctly. You just need to specify how you want to update your columns. You can use values from other tables or you can you specific numbers.
I would recommend sorting your data using a date value, if you have one. Updating a comment that was made today (rather than one from yesterday) is a better method because you are certain that you are updating the most recent comment.
Update d
Set CurrentComment = 1
From Deals d
Where not exists ( Select top 1 1
From Deals dd
Where d.CommentId < dd.CommentID
and dd.DealId = d.DealId)

How can I handle the time consuming SQL?

We have a table with 6 million records, and then we have a SQL which need around 7 minutes to query the result. I think the SQL cannot be optimized any more.
The query time causes our weblogic to throw the max stuck thread exception.
Is there any recommendation for me to handle this problem ?
Following is the query, but it's hard for me to change it,
SELECT * FROM table1
WHERE trim(StudentID) IN ('354354','0')
AND concat(concat(substr(table1.LogDate,7,10),'/'),substr(table1.LogDate,1,5))
BETWEEN '2009/02/02' AND '2009/03/02'
AND TerminalType='1'
AND RecStatus='0' ORDER BY StudentID, LogDate DESC, LogTime
However, I know it's time consuming for using strings to compare dates, but someone wrote before I can not change the table structure...
LogDate was defined as a string, and the format is mm/dd/yyyy, so we need to substring and concat it than we can use between ... and ... I think it's hard to optimize here.
The odds are that this query is doing a full-file scan, because you're WHERE conditions are unlikely to be able to take advantage of any indexes.
Is LogDate a date field or a text field? If it's a date field, then don't do the substr's and concat's. Just say "LogDate between '2009-02-02' and '2009-02-03' or whatever the date range is. If it's defined as a text field you should seriously consider redefining it to a date field. (If your date really is text and is written mm/dd/yyyy then your ORDER BY ... LOGDATE DESC is not going to give useful results if the dates span more than one year.)
Is it necessary to do the trim on StudentID? It is far better to clean up your data before putting it in the database then to try to clean it up every time you retrieve it.
If LogDate is defined as a date and you can trim studentid on input, then create indexes on one or both fields and the query time should fall dramatically.
Or if you want a quick and dirty solution, create an index on "trim(studentid)".
If that doesn't help, give us more info about your table layouts and indexes.
SELECT * ... WHERE trim(StudentID) IN ('354354','0')
If this is normal construct, then you need a function based index. Because without it you force the DB server to perform full table scan.
As a rule of thumb, you should avoid as much as possible use of functions in the WHERE clause. The trim(StundentID), substr(table1.LogDate,7,10) prevent DB servers from using any index or applying any optimization to the query. Try to use the native data types as much as possible e.g. DATE instead of VARCHAR for the LogDate. StudentID should be also managed properly in the client software by e.g. triming the data before INSERT/UPDATE.
If your database supports it, you might want to try a materialized view.
If not, it might be worth thinking about implementing something similar yourself, by having a scheduled job that runs a query that does the expensive trims and concats and refreshes a table with the results so that you can run a query against the better table and avoid the expensive stuff. Or use triggers to maintain such a table.
But the query time cause our weblogic to throw the max stuck thread exception.
If the query takes 7 minutes and cannot be made faster, you have to stop running this query real-time. Can you change your application to query a cached results table that you periodically refresh?
As an emergency stop-gap before that, you can implement a latch (in Java) that allows only one thread at a time to execute this query. A second thread would immediately fail with an error (instead of bringing the whole system down). That is probably not making users of this query happy, but at least it protects everyone else.
I updated the query, could you give me some advices ?
Those string manipulations make indexing pretty much impossible. Are you sure you cannot at least get rid of the "trim"? Is there really redundant whitespace in the actual data? If so, you could narrow down on just a single student_id, which should speed things up a lot.
You want a composite index on (student_id, log_date), and hopefully the complex log_date condition can still be resolved using a index range scan (for a given student id).
Without any further information about what kind of query you are executing and wheter you are using indexes or not it is hard to give any specific information.
But here are a few general tips.
Make sure you use indexes on the columns you often filter/order by.
If it is only a certain query that is way too slow, than perhaps you can prevent yourself from executing that query by automatically generating the results while the database changes. For example, instead of a count() you can usually keep a count stored somewhere.
Try to remove the trim() from the query by automatically calling trim() on your data before/while inserting it into the table. That way you can simply use an index to find the StudentID.
Also, the date filter should be possible natively in your database. Without knowing which database it might be more difficult, but something like this should probably work: LogDate BETWEEN '2009-02-02' AND '2009-02-02'
If you also add an index on all of these columns together (i.e. StudentID, LogDate, TerminalType, RecStatus and EmployeeID than it should be lightning fast.
Without knowing what database you are using and what is your table structure, its very difficult to suggest any improvement but queries can be improved by using indexes, hints, etc.
In your query the following part
concat(concat(substr(table1.LogDate,7,10),'/'), substr(table1.LogDate,1,5)) BETWEEN '2009/02/02' AND '2009/02/02'
is too funny. BETWEEN '2009/02/02' AND '2009/02/02' ?? Man, what are yuu trying to do?
Can you post your table structure here?
And 6 million records is not a big thing anyway.
It is told a lot your problem is in date field. You definitely need to change your date from a string field to a native date type. If it is a legacy field that is used in your app in this exact way - you may still create a to_date(logdate, 'DD/MM/YYYY') function-based index that transforms your "string" date into a "date" date, and allows a fast already mentioned between search without modifying your table data.
This should speed things up a lot.
With the little information you have provided, my hunch is that the following clause gives us a clue:
... WHERE trim(StudentID) IN ('354354','0')
If you have large numbers of records with unidentified student (i.e. studentID=0) an index on studentID would be very imbalanced.
Of the 6 million records, how many have studentId=0?
Your main problem is that your query is treating everything as a string.
If LogDate is a Date WITHOUT a time component, you want something like the following
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,0)
AND table1.LogDate = :SearchDate
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
If LogDate has a time component, and SearchDate does NOT have a time component, then something like this. (The .99999 will set the time to 1 second before midnight)
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,:StudentId0)
AND table1.LogDate BETWEEN :SearchDate AND :SearchDate+0.99999
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
Note the use of bind variables for the parameters that change between calls. It won't make the query much faster, but it is 'best practice'.
Depending on your calling language, you may need to add TO_DATE, etc, to cast the incoming bind variable into a Date type.
If StudentID is a char (usually the reason for using trim()) you may be able to get better performance by padding the variables instead of trimming the field, like this (assuming StudentID is a char(10)):
StudentID IN (lpad('354354',10),lpad('0',10))
This will allow the index on StudentID to be used, if one exists.

SQL Server - Automatically Calculate total every day

I need to keep a daily statistic of the count of records in a table.
Is there a way to automate counting the records daily and writing the result into another table? Maybe using a SQL Agent Job or something like that?
I'm using SQL Server 2008.
Thank you!
Edit:
If I delete today all records from 1/1/2010, the statistic still needs to show that at 1/1/2010 there were 500 records at the end of the day. So solely using GetDate() and summing up doesn't work, as I'd get 0 records with that method for 1/1/2010.
Add a column to your table like so:
ALTER TABLE My_Table
ADD insert_date DATETIME NOT NULL DEFAULT GETDATE()
You can then query against that as SQL intended.
Insert trigger: update counting table record for today (insert if not already created)
Delete trigger: decrement counting table record for today (insert if not already created)
In my opinion you answered your own question with the best option. Create a Job that just calls a stored procedure getting the count and stamping them.
The other option mentioned by Tom H. is a better choice, but If you can't alter the table for whatever reason the job is a good option.
Another option could be to place an insert trigger on that table to increment a count somewhere, but that could affect performance depending on how you implement it.
Setting up the job is simple through the SQL Management studio interface with a schedule of how often to run and what stored procedure to call. You can even just write the command directly in the command window of the step instead of calling a sp.
Tom's answer with OMG_Ponies' addendum about tombstoning instead of deleting is the best answer. If you are concerned about how many records were in the table on a certain day, there is a good possibility that someone one day will ask for information about those records on that day.
If that is a no go, then as others have said, create a second table with a field for the PK of the last record for the day, and then count for the day, then create a job that runs at the end of each day and counts all records with OrginalTable.PK > MAX(NewCountTable.Last_PK_Field) and adds that row (Last_PK_Field, Count) to the NewCountTable.
SQL Job is good -- yes.
Or you could add a date column to the table defaulted to GETDATE(). This wouldn't work if you don't want your daily counts to be affected by folks deleting records after the fact.

Resources