How to delete only few values from a sql table output - sql-server

If you notice, I am just keeping the first three blocks of a row same and deleting the others as they are same but not deleting the entire row as the balance qty keeps on changing. I want the first entire row to be retained, however succeeding rows should have only balance qty output for the part no , product descry and weight
Can someone please help me in suggesting a query in Microsoft SQL where I get an output shown in table 2 from table 1 output which I am getting through a query

This is really a presentation-level question. I'd never do it in SQL, not that it's impossible. Look up the LAG function. you can use that to look at the previous row's description and set it to blank if equal.

Related

SELECT with WHERE drops performance

In my application I use queries like
SELECT column1, column2
FROM table
WHERE myDate >= date1 AND myDate <= date2
My application crashes and returns a timeout exception. I copy the query and run it in SSMS. The results pane displays ~ 40 seconds of execution time. Then I remove the WHERE part of the query and run. This time, the returned rows appear immediately in the results table, although the query continues to print more rows (there are 5 million rows in the table).
My question is: how can a WHERE clause affect query performance?
Note: I don't change the CommandTimeOut property in the application. Left by default.
Without a WHERE clause, SQL Server is told to just start returning rows, so that's what it does, starting from the first row it can find efficiently (which may be the "first" row in the clustered index, or in a covering non-clustered index).
When you limit it with a where clause, SQL Server first has to go find those rows. That's what you're waiting on, because you don't have an index on myDate (or date1/date2, which I'm not sure are columns or variables), it needs to examine every single row.
Another way to look at it is to think of a phone book, which is an outdated analogy but gets the job done. Without a WHERE clause, it's like you're asking me to read you off all of the names and numbers in the book. If you add a WHERE clause that is not supported by an index, like read me off the names and numbers of every person with the first name 'John', it's going to take me a lot longer to start returning rows because I can't even start until I find the first John.
Or a slightly different analogy is to think of the index in a book. If you ask me to read off the page numbers for all the terms that are indexed, I can do that from the index, just starting from the beginning and reading through until the end. If you ask me to read off all the page numbers for all the terms that aren't in the index, or a specific unindexed term (like "the"), or even all the page numbers for indexed terms that contain the letter a, I'm going to have a much harder time.

Pentaho ETL Table Input Iteration

Context
Im having a table with Customer information. I want to find out the repeat customers in the table based on information like:
First_Name
Last_Name
DOB
Doc_Num
FF_Num
etc.
Now to compare one customer with the rest of the records in the same table, I need to:
read one record at a time
and compare this record with the rest in such a way that if a column does not match
then I need to compare the other columns for the records
Question
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete? This process should continue till all the records in the table are checked/ processed.
Also, would like to know if we can Iterate the same procedure instead of reading one record at a time from Table_Input?
To make your Table Input read and write row by row, doesn't see like the best solution and I don't think it would achieve what you want (e.g. keeping a track of previous records).
You could try using the Unique rows step, that can redirect a duplicate row (using the key you want) to another flow where it will be treated differently (or delete it if you don't want it). From what I can see you'll want to have multiple Unique rows to check each one of the columns.
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete?
Yes it is possible to change the buffer rows in between the steps. You can change the Nr of Rows in rowset to 1. But it is not recommended to change this property unless you run low on memory. This might make the tool behave abnormally.
Now as per the comments shared, i see there are two questions:
1. You need to check the count of duplicate entries:
You can achieve this result either using a Group By step or using the Unique step as answered by astro11. You can get the count of names easily and if the count is greater than 1, you can consider it as duplicate.
2. Checking on the two data rows:
You want to validate two names (for e.g.) like "John S" and "John Smith". Both are names should ideally be considered as a single name, hence a duplicate.
First of all this is a data quality issue and no tool will consider these rows as same. What you can do is to use a step called "Fuzzy match". This step based on the algorithms you choose will try to give you the measure of the closest match of Names. But for achieving this you need to have a seperate MASTER table with all the possible names. You can use "Jaro Winkler" algo to get the closest match.
Hope this helps :)

Creating sequence in query

I am trying to add an incremental column to tally time in 30 second increments for a report and need to add said column to an existing query but cannot find a way to add it without rebuilding the report and I don't have that kind of time.
Using identity and sequence just give me an error because I am using an open query table generated from a Wonderware Historian query and querying that table. Now I need to add a column that puts a 30 second increment per line starting at zero for line one.
Sorry if I am wording this terribly I'm unsure how else to ask. Can anyone help me with some code to add a generated incremental INT column to a query without having to make a bunch of extra tables with joins?

Coldfusion Compare Two Query Results from Same Database

I've done some research on this site for an issue I'm having, however, I'm finding that the solution is not exactly what I'm looking for, or the implementation doesn't relate to what I'm trying to do. Or, simply put, I just can't seem to figure it out. Here is my issue.
We have a monthly query that we would run that we would send to a third party of physicians, their degree, specialty and clinic. I have the query established already. But recently they wanted to just have us export new results from the previous months data, instead of the whole results list. So, I thought I would create a tool that I would start out simply importing the previous months data. And then taking the query I had been using, putting that in a coldfusion page, run it, and it would show me new records ran for the current month we're in, to the previous month. When I run the report of new data each month, it would save that data in the database with the columns r_month and r_year, which simply means report month/year. So to initially populate the database I just imported Octobers data so I can have a base with the r_month/year being "10" and "2014" respectively. There are 674 records. Then created my page and had a button that would run the same query, save those results, which the r_month and r_year is saved as "11" and "2014" respectively. When I do that, I have 682 records. So, for the month of November, there are 8 "different" or new records from the previous month (October). My question is, what is the best way to run a query that takes the data from October (10/2014) and compare it to the November's data (11/2014), and just give me the new 8 records that were new from November.
Sorry this is long, but wanted to give you guys a detail so you have as much information as possible. I don't really have a code sample I can provide, because apparently the way I was attempting before (using loops etc.) was just not working. Tried looping through previous month query and current month query, trying to find a difference, but that wasn't working. Once again, I've tried using similar samples I've found on here, but they are either not what I'm looking for, or I just can't figure them out. Basically at the end of the process, there needs to be a button that only exports the new records (in this example, the 8) into an excel sheet that we can simply email them.
Any help would be greatly appreciated.
SOLUTION 1 - Since you are using SQL server you can do this pretty easily within the query. You have already logged the previous data so you presumably have a key for the "old" physicians in your log table. Try something like this:
<cfquery name="getNewPHys" datasource="#dsn#">
SELECT *
FROM sourceTable
WHERE physID NOT IN
(SELECT physID FROM logtable
WHERE daterange between #somerange# AND #someotherrange#)
</cfquery>
You would have to add your own values and vars but you get the idea.
NOTE: This is psuedo-code. you would OF COURSE use cfqueryparam for any of your variables.
SOLUTION 2
Another way to do this is by using a dateadded or lastUPdated table. Every time a row is updated you update the lastupdated column with the current date/time. Then selecting recent records is a matter of selecting any records which have been updated within your range. Taht's what Leigh suggested in her comment.
I would add one other comment. You seem to be trying to solve this problem without changing anything in your data table. That's not going to work. You need to think about your schema a bit more. For exmaple, solution 2 would involve adding an additional column and you could even add a MSSQL trigger that automatically updated that field whenever the record was updated. Wouldn't that work?
I still think we are missing something. Are you perchance overwriting your data each time? Or producing duplicate records - 674 this month, 682 next month with duplicates? If so, that's what you need to correct. Anything else is going to be a bolt on solution that creates more problems down the road.
Step 1 - Add a computed column to your table. Make sure you persist the data so you can index it. The computation should result in values like '201401' for January 2014, etc. Let's call that column YearMonth
Then your code and query looks like this:
ControlYearMonth = "201410"; // October 2014
<cfquery>
select field1, field2, etc
from yourtable
where YearMonth = <cfqueryparam value="#ControlYearMonth#">
except
select field1, field2, etc
from yourtable
where YearMonth < <cfqueryparam value="#ControlYearMonth#">
<cfquery>

SQL Server how to show detailed information of a delete statement

I wanted to perform a simple delete statement like this:
DELETE
FROM table
WHERE table.value = 123
and I am expecting the delete 512 rows from the table since those 512 rows have value 123.
However, there are 5 lines of "xxx rows affected" displayed after running the delete statement.
The last two lines are identical, "512 lines affected", which is expected.
The first "512 rows affected" was the actual deletion.
The second "512 rows affected" was a trigger(the only delete trigger) inserting 512 rows into table_AUDIT.
What about the first 3 lines of "xxx rows affected"?
I don't know which tables are affected so I don't know how to use OUTPUT(googled) to figure out which rows/tables are affected.
My question is: how to display detailed information of the rows deleted? Insert of meaningless "123 rows effected", I like to see which rows from which tables are deleted.
The best you can do is get a query plan which shall include the triggers. Which rows is something left to your intelligence - query plans generally do not provide this information.

Resources