SQL Select on Records that Meet Criteria from Another Table - sql-server

I've got a very complex query and trying to give a simple example of one of the sub-tables I'm having problems with, if you need more information or context please let me know.
I've posed a CSV file with some sample data here:
We make cakes, and 99% of our cakes are made by us. The 1% is when we have a cake delivered to us from a subcontractor and we 'Receive' and 'Audit' it.
What I wanted to do was to write something like this:
Join Instruction
ON Cake.Cake_Key = Instruction.Cake_Key
JOIN Steps
ON Instruction.Step_Key = Steps.Step_Key
MIN(Steps.Step_Key) = 1
This fails because you can't have an aggregate in the WHERE clause.
The desired results would be:
Cake C 13 Receive
Cake C 14 Audit
Cake D 15 Receive
Cake D 16 Audit
Thank you in advance for your help!

Take a look at the HAVING keyword:
It works more or less the same as the WHERE clause but for aggregate functions after the GROUP BY clause.
Beware however this can be slow. You should try filtering down the number of records as much as possible in the WHERE and even consider using a tempory table to aggregate the data into first.

What you're talking about is the GROUP BY/HAVING clause, so in your case you would need to add something like
GROUP BY Cake.Cake, Instruction.Cake_Instruction_Key, Steps
HAVING MIN(Steps.Step_Key) = 1


how to under understand TDengine’s interval keyword and sliding?

I was using TDengine’s continuous query. It is hard for me to understand this, since I was used to using MySQL.what’s more,TDengine support it with two key words “interval” and “sliding”. I test it with sql like
select count(*) from ${table} interval(1m);
select count(*) from ${table} interval(1m) sliding(30s);
All the data in the table.
Does any one can help explain why came to that result?
You can refer to this link, which accepts the usage of interval and sliding in detail.

Groupby and count() with alias and 'normal' dataframe: python pandas versus mssql

Coming from a SQL environment, I am learning some things in Python Pandas. I have a question regarding grouping and aggregates.
Say I group a dataset by Age Category and count the different categories. In MSSQL I would write this:
SELECT AgeCategory, COUNT(*) AS Cnt
GROUP BY AgeCategory
The result set is a 'normal' table with two columns, the second column I named Count.
When I want to do the equivalent in Pandas, the groupby object is different in format. So now I have to reset the index and rename the column in a following line. My code would look like this:
grouped = df.groupby('AgeCategory')['ColA'].count().reset_index()
grouped.columns = ['AgeCategory', 'Count']
My question is if this can be accomplished in one go. Seems like I am over-doing it, but I lack experience.
Thanks for any advise.
Regards, M.
Use parameter name in DataFrame.reset_index:
grouped = df.groupby('AgeCategory')['ColA'].count().reset_index(name='Count')
grouped = df.groupby('AgeCategory').size().reset_index(name='Count')
Difference is GroupBy.count exclude missing values, GroupBy.size not.
More information about aggregation in pandas.

Query Performance in Access 2007 - drawing on a SQL Server Express backend

I've been banging my head on this issue for a little while now, and decided I should ask for help. I have a table which holds temperature/humidity chart recorder data (currently over 775,000 records) from which I am trying to run a statistical query against it. The problem is that this often will take up to two minutes, and sometimes will not come back at all - causing me to force close the program (Control-Alt-Delete). At first, I didn't have as much of a problem - it was only after I hit the magical 500k records mark that I started getting serious slowdowns, getting progressively worse as more data was compiled and imported into the table.
Here is the query (pass-through):
SELECT dbo.tblRecorderLogs.strAreaAssigned, Min(dbo.tblRecorderLogs.datDateRecorded) AS FirstRecorderDate, Max(dbo.tblRecorderLogs.datDateRecorded) AS LastRecordedDate,
Round(Avg(dbo.tblRecorderLogs.intTempCelsius),2) AS AverageTempC,
Round(Avg(dbo.tblRecorderLogs.intRHRecorded),2) AS AverageRH,
Count(dbo.tblRecorderLogs.strAreaAssigned) AS Records
FROM dbo.tblRecorderLogs
GROUP BY dbo.tblRecorderLogs.strAreaAssigned
ORDER BY dbo.tblRecorderLogs.strAreaAssigned;
Here is the table structure in which the chart data is stored:
idRecorderDataID Number Primary Key
datDateEntered Date/Time (indexed, duplicates OK)
datTimeEntered Date/Time
intTempCelcius Number
intDewPointCelcius Number
intWetBulbCelcius Number
intMixingGPP Number
intRHRecorded Number
strAssetRecorder Text (indexed, duplicates OK)
strAreaAssigned Text (indexed, duplicates OK)
I am trying to write a program which will allow people to pull data from this table based on Area Assigned, as well as start and end dates. With the dataset size I currently have, this kind of report is simply too much for it to handle (it seems) and the machine doesn't ever return an answer. I've had to extend the ODBC timeout to almost 180 seconds in any queries dealing with this table, simply because of the size. I could use some serious help, if people have some. Thank you in advance!
-- Edited 08/13/2012 # 1050 hours --
I have not been able to test the query on the SQL Server due to the fact that the IT department has taken control of the machine in question, and has someone logged into it full-time using the remote management console. I have tried an interim step to lessen the impact of the performance issue, but I am still looking for a permanent solution to this issue.
Interim step:
I created a local table mirroring the structure of the dbo.tblRecorderLogs SQL Server table, to which I do a INSERT INTO using the former SELECT statement as it's subquery. Then any subsequent statistical analysis is drawn from this 'temporary' local table. After the process is complete, the local table is truncated.
-- Edited 08/13/2012 # 1217 hours --
Ran the shown query on the SQL Server Management Console, took 1 minute 38 seconds to complete according to the query timer provided by the console.
-- Edit 08/15/2012 # 1531 hours --
Tried to run query as VBA DoCmd.RunSQL statement to populate a temporary table using the following code:
INSERT INTO tblTempRecorderDataStatsByArea ( strAreaAssigned, datFirstRecord,
datLastRecord, intAveTempC, intAveRH, intRecordCount )
SELECT dbo_tblRecorderLogs.strAreaAssigned, Min(dbo_tblRecorderLogs.datDateRecorded)
AS MinOfdatDateRecorded, Max(dbo_tblRecorderLogs.datDateRecorded) AS MaxOfdatDateRecorded,
Round(Avg(dbo_tblRecorderLogs.intTempCelsius),2) AS AveTempC,
Round(Avg(dbo_tblRecorderLogs.intRHRecorded),2) AS AveRHRecorded,
Count(dbo_tblRecorderLogs.strAreaAssigned) AS CountOfstrAreaAssigned FROM
dbo_tblRecorderLogs GROUP BY dbo_tblRecorderLogs.strAreaAssigned ORDER BY
The problem arises when the code is executed, the query takes so long - it encounters Timeout before it finishes. Still hoping for a 'magic bullet' to fix this...
-- Edited 08/20/2012 # 1241 hours --
The only 'quasi' solution I've found is running the failed query repeatedly (sort of priming the pump, as it were) so that when the query is called again by my program - it has a relative chance of actually completing before the ODBC SQL Server driver times out. Basically, a filthy filthy hack - but I don't have a better one to combat this issue.
I've tried creating a view, which works on the server side - but doesn't speed things up.
The proper fields being aggregated are indexed properly, so I can't make any changes there.
I am only pulling information from the database that is immediately useful to user - no 'SELECT * madness' going on here.
I think I am, officially, out of things to try - aside from throwing raw computing horsepower at the problem, which isn't a solution right now as the item isn't live, and I have no budget to procure better hardware. I will post this as an 'answer' and leave it up until Sept 3rd - where if I do not have better answers, I will accept my own answer and accept defeat.
When I've had to run min/max functions on several fields from the same table I've often found it quicker to do each column separately as a subquery in the from line of the main/outer query.
So your query would be like this:
SELECT rLogs1.strAreaAssigned, rLogs1.FirstRecorderDate, rLogs2.LastRecorderDate, rLog3.AverageTempC, rLogs4.AverageRH, rLogs5.Records
FROM (((
(SELECT strAreaAssigned, min(datDateRecorded) as FirstRecorderDate FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs1
inner join
(SELECT strAreaAssigned, Max(datDateRecorded) as LastRecordedDate, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs2
on rLogs1.strAreaAssigned = rLogs2.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Round(Avg(intTempCelsius),2) AS AverageTempC, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs3
on rLogs1.strAreaAssigned = rLogs3.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Round(Avg(intRHRecorded),2) AS AverageRH, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs4
on rLogs1.strAreaAssigned = rLogs4.strAreaAssigned)
inner join
(SELECT strAreaAssigned, Count(strAreaAssigned) AS Records, FROM dbo.tblRecorderLogs GROUP BY strAreaAssigned) rLogs5
on rLogs1.strAreaAssigned = rLogs5.strAreaAssigned
ORDER BY rLogs1.strAreaAssigned;
If you take your query and the one above, copy them into the same query window in SQL Server and run the estimated execution plan you should be able to compare them and see which one works better.

Rails 3, ActiveRecord, PostgreSQL - ".uniq" command doesn't work?

I have following query:
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15).uniq
and gives me the error
PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...s"."user_id" WHERE (articles.user_id != 1) ORDER BY Random() L...
When I update the original query to
Article.joins(:themes => [:users]).where(["articles.user_id != ?", current_user.id]).order("Random()").limit(15)#.uniq
so the error is gone... In MySQL .uniq works, in PostgreSQL not. Exist any alternative?
As the error states for SELECT DISTINCT, ORDER BY expressions must appear in select list.
Therefore, you must explicitly select for the clause you are ordering by.
Here is an example, it is similar to your case but generalize a bit.
Article.select('articles.*, RANDOM()')
.where(:column => 'whatever')
So, explicitly include your ORDER BY clause (in this case RANDOM()) using .select(). As shown above, in order for your query to return the Article attributes, you must explicitly select them also.
I hope this helps; good luck
Just to enrich the thread with more examples, in case you have nested relations in the query, you can try with the following statement.
Person.find(params[:id]).cars.select('cars.*, lower(cars.name)').order("lower(cars.name) ASC")
In the given example, you're asking all the cars for a given person, ordered by model name (Audi, Ferrari, Porsche)
I don't think this is a better way, but may help to address this kind of situation thinking in objects and collections, instead of a relational (Database) way.
I assume that the .uniq method is translated to a DISTINCT clause on the SQL. PostgreSQL is picky (pickier than MySQL) -- all fields in the select list when using DISTINCT must be present in the ORDER_BY (and GROUP_BY) clauses.
It's a little unclear what you are attempting to do (a random ordering?). In addition to posting the full SQL sent, if you could explain your objective, that might be helpful in finding an alternative.
I just upgraded my 100% working and tested application from 3.1.1 to 3.2.7 and now have this same PG::Error.
I am using Cancan...
#users = User.accessible_by(current_ability).order('lname asc').uniq
Removing the .uniq solves the problem and it was not necessary anyway for this simple query.
Still looking through the change notes between 3.1.1 and 3.2.7 to see what caused this to break.

How do I search a "Property Bag" table in SQL?

I have a basic "property bag" table that stores attributes about my primary table "Card." So when I want to start doing some advanced searching for cards, I can do something like this:
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE dbo.CardProperty.IdPrp = 3 AND dbo.CardProperty.Value = 'Fiend'
SELECT dbo.Card.Id, dbo.Card.Name
FROM dbo.Card
INNER JOIN dbo.CardProperty ON dbo.CardProperty.IdCrd = dbo.Card.Id
WHERE (dbo.CardProperty.IdPrp = 10 AND (dbo.CardProperty.Value = 'Wind' OR dbo.CardProperty.Value = 'Fire'))
What I need to do is to extract this idea into some kind of stored procedure, so that ideally I can pass in a list of property/value combinations and get the results of the search.
Initially this is going to be a "strict" search meaning that the results must match all elements in the query, but I'd also like to have a "loose" query so that it would match any of the results in the query.
I can't quite seem to wrap my head around this one. My previous version of this was to do generate some massive SQL query to execute with a lot of AND/OR clauses in it, but I'm hoping to do something a little more elegant this time. How do I go about doing this?
it seems to me that you have an EAV model here.
if you're using sql server 2005 and up i'd suggest you use XML datatype for this:
makes searching and stuff much easier with built in xml querying capabilities.
if you can't change your model then look at this:
