I need help using SUM and GROUP BY in SQL Server.
I am generating the query based on 5 tables. I have tried in SQL Server.
Some parts of the query are working, but when I advance the query, I get the wrong results/data.
The problem is that the data is processed twice instead of once on every group by field, e.g. farmer_ID, where the farmer bears or has two or more records.
This happens when i add more tables to the join - on one or two tables, the sum values are okay.
Hence I get farmer_sales = 200 instead of 100.
Kindly let me know how I can get some help
Thanks
David
You can use the outer join (Left or Right) and choose the table that have only one record for each item
Another solution you can use Keyword Distinct before the column name
Nobody here will be able to help without table definitions, the requirements & the query.
With these issues I find it helpful to write the query so you get the proper rows you need. It sounds like you have either dodgy data or incomplete join conditions but it's not possible to tell that without the above. You can debug your data by finding a problem (e.g. farmer_sales) and working through the raw data & query from there. You will have an incomplete PK/FK relationship in your query or a missing constraint allowing bad data. Or you have misunderstood the requirement or the requirement does not makes sense to the data model.
Once you have the query working correctly you can add the aggregations.
One bit of general advice I can give is that adding DISTINCT is almost always the wrong approach.
I've got a timesheet table with an id, and total hours.
I now have a
select ts.totalhours, fn_NormalTime(ts.id), fn_Overtime(ts.id) from tsentries ts
I have done this improve readability in the original query as well as to centralize logic with regards to holidays, Sunday time, etc.
However, each function in turn now does a select from the table to get items like date, rules, etc.
Is there a way to get sql to not redo the retrieve for every function or will internal caching suffice? Looking to improve speed.
Using functions in the SELECT part of the query will lead to bad performances because the SELECT inside the function will need to be executed for each line you return from your query. There is no way to cache this result this way. The query optimizer will only do a good job if you write all the data retrieval logic inside the same query.
I have a little query about Codeigniter's db queries. (Order by)
$this->db->select('*, MONTH(date) as mnth');
$this->db->where('school_id', $school_id);
$this->db->group_by('MONTH(date)');
$this->db->order_by('date', 'desc');
$query = $this->db->get('events');
In above piece of code i want to get events with respect to each month but it is returning me only the first event of each month.. I have tried it with php loops but i want it without loops.. Hoping for good responses.
Best Regards
Muhammad Saud
GROUP BY will aggregate all of the column values that match the parameter into one row. You generally need to use aggregate functions like SUM in the SELECT statement with GROUP BY. If you're selecting columns that aren't being used with an aggregate function, then you'd include them in the GROUP BY. I'm not sure exactly what you're trying to achieve, but I don't think the GROUP BY is doing what you'd expect - maybe try removing it (or add more column names to it).
I have 4million records in one of my tables. I need to get the last 25 records that have been added in the last 1 week.
This is how my current query looks
SELECT TOP(25) [t].[EId],
[t].[DateCreated],
[t].[Message]
FROM [dbo].[tblEvent] AS [t]
WHERE ( [t].[DateCreated] >= Dateadd(DAY, Datediff(DAY, 0, Getdate()) - 7, 0)
AND [t].[EId] = 1 )
ORDER BY [t].[DateCreated] DESC
Now I do not have any indexes running for this table and do not intend to have one. This query takes about 10-15 seconds to run and my apps times-out, now is there a way to better it?
You should create an index on EId, DateCreated or at least DateCreated
Without this only way of optimising this that I can think of would be to maintain the last 25 in a separate table via an insert trigger (and possibly update and delete triggers as well).
If you have an ID in the table that is autoincrement (not the Eid but a separate PK) you can order by ID desc instead of DateCreated, that might make your order by faster.
otherwise you do need an index (but your question says you do not want that).
If the table has no indexes to support the query you are going to be forced to perform a table scan.
You are going to struggle to get around the table scan aspect of that - and as the table grows, the response time will get slower.
You are going to have to endevour to educate your client as to the problems going forward they face, and that they should consider an index. They may be saying no, you need to show the evidence to support the reasoning, show them times with / without, and make sure the impact to the record insertion is also shown, it's a relatively simple cost / benefit / detriment for the adding of the index / not adding of it. If they insist on no index, then you have no choice but to extend your timeouts.
You should also try query hint:
http://msdn.microsoft.com/en-us/library/ms181714.aspx
With option FAST n -- number of rows.
We have a table with 6 million records, and then we have a SQL which need around 7 minutes to query the result. I think the SQL cannot be optimized any more.
The query time causes our weblogic to throw the max stuck thread exception.
Is there any recommendation for me to handle this problem ?
Following is the query, but it's hard for me to change it,
SELECT * FROM table1
WHERE trim(StudentID) IN ('354354','0')
AND concat(concat(substr(table1.LogDate,7,10),'/'),substr(table1.LogDate,1,5))
BETWEEN '2009/02/02' AND '2009/03/02'
AND TerminalType='1'
AND RecStatus='0' ORDER BY StudentID, LogDate DESC, LogTime
However, I know it's time consuming for using strings to compare dates, but someone wrote before I can not change the table structure...
LogDate was defined as a string, and the format is mm/dd/yyyy, so we need to substring and concat it than we can use between ... and ... I think it's hard to optimize here.
The odds are that this query is doing a full-file scan, because you're WHERE conditions are unlikely to be able to take advantage of any indexes.
Is LogDate a date field or a text field? If it's a date field, then don't do the substr's and concat's. Just say "LogDate between '2009-02-02' and '2009-02-03' or whatever the date range is. If it's defined as a text field you should seriously consider redefining it to a date field. (If your date really is text and is written mm/dd/yyyy then your ORDER BY ... LOGDATE DESC is not going to give useful results if the dates span more than one year.)
Is it necessary to do the trim on StudentID? It is far better to clean up your data before putting it in the database then to try to clean it up every time you retrieve it.
If LogDate is defined as a date and you can trim studentid on input, then create indexes on one or both fields and the query time should fall dramatically.
Or if you want a quick and dirty solution, create an index on "trim(studentid)".
If that doesn't help, give us more info about your table layouts and indexes.
SELECT * ... WHERE trim(StudentID) IN ('354354','0')
If this is normal construct, then you need a function based index. Because without it you force the DB server to perform full table scan.
As a rule of thumb, you should avoid as much as possible use of functions in the WHERE clause. The trim(StundentID), substr(table1.LogDate,7,10) prevent DB servers from using any index or applying any optimization to the query. Try to use the native data types as much as possible e.g. DATE instead of VARCHAR for the LogDate. StudentID should be also managed properly in the client software by e.g. triming the data before INSERT/UPDATE.
If your database supports it, you might want to try a materialized view.
If not, it might be worth thinking about implementing something similar yourself, by having a scheduled job that runs a query that does the expensive trims and concats and refreshes a table with the results so that you can run a query against the better table and avoid the expensive stuff. Or use triggers to maintain such a table.
But the query time cause our weblogic to throw the max stuck thread exception.
If the query takes 7 minutes and cannot be made faster, you have to stop running this query real-time. Can you change your application to query a cached results table that you periodically refresh?
As an emergency stop-gap before that, you can implement a latch (in Java) that allows only one thread at a time to execute this query. A second thread would immediately fail with an error (instead of bringing the whole system down). That is probably not making users of this query happy, but at least it protects everyone else.
I updated the query, could you give me some advices ?
Those string manipulations make indexing pretty much impossible. Are you sure you cannot at least get rid of the "trim"? Is there really redundant whitespace in the actual data? If so, you could narrow down on just a single student_id, which should speed things up a lot.
You want a composite index on (student_id, log_date), and hopefully the complex log_date condition can still be resolved using a index range scan (for a given student id).
Without any further information about what kind of query you are executing and wheter you are using indexes or not it is hard to give any specific information.
But here are a few general tips.
Make sure you use indexes on the columns you often filter/order by.
If it is only a certain query that is way too slow, than perhaps you can prevent yourself from executing that query by automatically generating the results while the database changes. For example, instead of a count() you can usually keep a count stored somewhere.
Try to remove the trim() from the query by automatically calling trim() on your data before/while inserting it into the table. That way you can simply use an index to find the StudentID.
Also, the date filter should be possible natively in your database. Without knowing which database it might be more difficult, but something like this should probably work: LogDate BETWEEN '2009-02-02' AND '2009-02-02'
If you also add an index on all of these columns together (i.e. StudentID, LogDate, TerminalType, RecStatus and EmployeeID than it should be lightning fast.
Without knowing what database you are using and what is your table structure, its very difficult to suggest any improvement but queries can be improved by using indexes, hints, etc.
In your query the following part
concat(concat(substr(table1.LogDate,7,10),'/'), substr(table1.LogDate,1,5)) BETWEEN '2009/02/02' AND '2009/02/02'
is too funny. BETWEEN '2009/02/02' AND '2009/02/02' ?? Man, what are yuu trying to do?
Can you post your table structure here?
And 6 million records is not a big thing anyway.
It is told a lot your problem is in date field. You definitely need to change your date from a string field to a native date type. If it is a legacy field that is used in your app in this exact way - you may still create a to_date(logdate, 'DD/MM/YYYY') function-based index that transforms your "string" date into a "date" date, and allows a fast already mentioned between search without modifying your table data.
This should speed things up a lot.
With the little information you have provided, my hunch is that the following clause gives us a clue:
... WHERE trim(StudentID) IN ('354354','0')
If you have large numbers of records with unidentified student (i.e. studentID=0) an index on studentID would be very imbalanced.
Of the 6 million records, how many have studentId=0?
Your main problem is that your query is treating everything as a string.
If LogDate is a Date WITHOUT a time component, you want something like the following
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,0)
AND table1.LogDate = :SearchDate
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
If LogDate has a time component, and SearchDate does NOT have a time component, then something like this. (The .99999 will set the time to 1 second before midnight)
SELECT * FROM table1
WHERE StudentID IN (:SearchStudentId,:StudentId0)
AND table1.LogDate BETWEEN :SearchDate AND :SearchDate+0.99999
AND TerminalType='1'
AND RecStatus='0'
ORDER BY EmployeeID, LogDate DESC, LogTime
Note the use of bind variables for the parameters that change between calls. It won't make the query much faster, but it is 'best practice'.
Depending on your calling language, you may need to add TO_DATE, etc, to cast the incoming bind variable into a Date type.
If StudentID is a char (usually the reason for using trim()) you may be able to get better performance by padding the variables instead of trimming the field, like this (assuming StudentID is a char(10)):
StudentID IN (lpad('354354',10),lpad('0',10))
This will allow the index on StudentID to be used, if one exists.