CASE Statement in SQL Server to get output I need - sql-server

So a friend of mine told me to check the almost 5 million email addresses that got 'Hacked' on Gmail.
I downloaded the file a text file, with almost 5 million emails (4804288). I figured I'd just open it in my text editor and Ctrl+F my email address. Well it took forever to just open the .txt document and it crashed. I then exported it to Excel but it has a limit of 1 million+ rows. Since I'm studying SQL I figured I'll just load it into SQL Server and query it creating a stored procedure. It should be cool.
So.. what did I do?
Created a table called 5Mil.
And bulk inserted the info from the .txt file:
BULK INSERT [dbo].[5Mil]
FROM 'C:\list\google.txt'
WITH (fieldterminator = ',', rowterminator = '#gmail.com')
GO
First question, since the txt file had one line per email without a ',' at the end the only way I could load the info was using the rowterminator = '#gmail.com' which truncated '#gmail.com' and left only the username part of the email.
Maybe someone can help me understand how to import the information including the #gmail.com.
I was able to import the email addresses 1 per row. total rows 4804288.
So far so good.
I am currently learning CTE's so I figured I'd apply this to my stored procedure.
This is what I did.
CREATE PROC googlemails
#email VARCHAR(MAX)
AS
WITH CTE AS
(
SELECT Emails
FROM dbo.[5Mil]
WHERE Emails LIKE '%'+#email+'%'
)
SELECT
CASE
WHEN Emails IS NOT NULL
THEN Emails
ELSE 'you are safe'
END AS 'Google Email'
FROM CTE
When I run the procedure and it finds emails it properly lists them.
But when I put an email address that's not in the list I get
Google Email
Blank. What I want is to be able to show 'You are safe' letting the user know that your email was not part of the 5 Mill 'hacked'.
What would be the proper way to use the CASE statement here. Or like always other ways of accomplishing this task. For learning purposes.
Thank you.

select case when exists( select 1 from 5MIL WHERE Emails LIKE '%'+#email+'%')
then 'hacked'
else 'not hacked' end
CTE not appropriate here; they're mostly for simplifying large queries and for recursive queries.

As I understand it you now have a table called 5Mil with one column called Email each with a username without the "#gmail.com". It is unnecessarily redundant to append "#gmail.com" to each one.
Soooo.... why not just search for the username part? Using a LIKE '%email%' is slow and inefficient. Much simpler to just to do this in your stored procedure (similar to dudNumber4's answer but don't use "LIKE"):
--SET #Email = 'someusername' -- note no "#gmail.com"
if EXISTS(SELECT 1 from 5Mil where Email = #Email)
SELECT 'Hacked'
ELSE
SELECT "Not hacked"
if you create an index on 5Mil.Email the search will be almost instant.

Related

Replace function is very slow in SQL Server

I am trying to execute the query as below in SQL Server but it is very slow due to the Replace function, can anyone help in optimizing the query to run very fast
SELECT
dbo.Orders.OrigID
FROM
dbo.Orders
INNER JOIN
dbo.OrderedReports ON dbo.Orders.OrigID = dbo.OrderedReports.OrderID
WHERE
(REPLACE(Orders.Email, '^^', '''') = 'Khengtjinyap#gmail.com')
One simple long term fix here would be to simply remove the ^^ symbols from the email before you insert the data. In order to deal with the email data which already exists in the Orders table, you could do the following one-time update:
UPDATE Orders
SET Email = REPLACE(Email, '^^', '')
WHERE Email LIKE '%^^%';
You could also create a new column which contains the cleaned up email addresses. Once you have made this change, just use a direct equality comparison in your query, which can now also use an index on the Email column.

How does SQL Server handle failed query to linked server?

I have a stored procedure that relies on a query to a linked server.
This stored procedure is roughly structured as follows:
-- Create local table var to stop query from needing round trips to linked server
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT eid FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
-- This view obscures sensitive information and shows only the data that I have permission to see
-- Many other things
The query itself is much more complex, but the key idea is building this temporary table from a linked server (because it takes the query 5 minutes to run if I don't, versus 3 seconds if I do).
I've recently had an issue where I ended up with updates to my table that failed to get checked against the linked server for duplicate information.
The logical chain of events is this:
Get all of the data from the original view
The original view contains maybe 3000 records, of which maybe 30 are
duplicates of the entity in question, but with 1 field having a
different value.
I then have to grab data from a different server to know which of
the duplicates is the correct one.
When the stored procedure runs, it updates each record.
ERROR STEP - when the stored procedure hits a duplicate record, it
updates my_table again - so es gets changed multiple times in a row.
The temp table was added after the fact when we realized incorrect es values were being introduced to my_table.
'my_database` does not contain the data needed to determine which is the correct tuple, hence the requirement for the linked server.
As far as I can tell, we had a temporary network interruption or a connection timeout that stopped my_server from getting the response back from linked_server, and it just passed an empty table to the rest of the procedure.
So, my question is - how can I guard against this happening?
I can't just check if the table is empty, because it could legitimately be empty. I need to definitively know if that initial SELECT from linked_server failed, if it timed out, or if it intentionally returned nothing.
without knowing the definition of the table you're querying you could get into an issue where your data is to long and you get a truncation error on your table.
Better make sure and substring it...
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT SUBSTRING(eid,1,6) FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
I had a similar problem where I needed to move data between servers, could not use a network connection so I ended up doing BCP out and BCP in. This is fast, clean and takes away the complexity of user authentication, drivers, trust domains. also it's repeatable and can be used for incremental loading.

Pentaho error data truncate when insert into SQL Server

I'm developing Pentaho job to get data from BigQuery and insert into SQL Server. The job is quite simple as you can see below but during insert to a SQL Server table process in thrown 'Data truncation' error. Then I checked max length for this column. It is just 64 while in database it is nvarchar(500). Moreover that I want to know how is look like then for error records I log into text file. You can see it below. I've spent for 3 days with this problem but still not get an answer yet. Please do guide me.
What I have done so far
String cut step to sub string
String Operation step to trim
put left function in SELECT statement
put REGEXP_REPLACE(uuid, ' ', '') which remove spaces in SELECT statement.
All I have done getting the same error.
Pentaho job
Error records in text file
I have been solved this problem. It is my stupid mistake. I just recreate table and put more number for length of that column.
My case
post_name nvarchar(50) -> nvarchar(150)

Tracking User activity log for SQL Server database

I have a database with multiple tables and I want to log the users activity via my MVC 3 web application.
User X updated category HELLO. Name changed from 'HELLO' to 'Hi There' on 24/04/2011
User Y deleted vehicle Test on 24/04/2011.
User Z updated vehicle Bla. Name changed from 'Blu' to 'Bla' on 24/04/2011.
User Z updated vehicle Bla. Wheels changed from 'WheelsX' to 'WheelsY' on 24/04/2011.
User Z updated vehicle Bla. BuildProgress changed from '20' to '50' on 24/04/2011
My initial idea is to have on all of my actions that have database crud, to add a couple lines of code that would enter those strings in a table.
Is there a better way of checking which table and column has been modified than to check every column one by one with if statements (first I select the current values, then check each of them with the value of the textbox) I did that for another ASPX web app and it was painful.
Now that I'm using MVC and ADO.NET Entity Data Model I'm wondering if a faster way to find the columns that were changed and build a log like the one above.
You can also accomplish this by putting your database into full recovery mode and then reading the transaction log.
When database is in a full recovery mode then sql server logs all Update, insert and delete (and others such as create, alter, drop..) statements into it's transaction log.
So, using this approach you dont need to make any additinal changes to your application or your database structure.
But you will need 3rd party sql transaction log reader. Red gate has a free solution for sql server 2000 only. If your server is 2005 or higher you would probably want to go with ApexSQL Log
Also, this approach will not be able to audit select statements but it's definately the easiest to implement if you dont really need to audit select queries.
The way I see, you have two options:
Create triggers in the database side, mapping changes in a table by table basis and getting result into a Log table
OR
Having the code handle the changes. You would have a base class with data and with reflection you could iterate all object properties and see what has changed. And then save that into your Log table. Of course, that coding would be on your Data Access Layer.
By the way, if you have a good code structure/architecture, I would go with the second option.
You could have a trigger (AFTER insert/update/deelte) on each table you want to monitor. The beauty is columns_updated() which returns a barbinary value, indicating which columns have been updated.
Here is some snippet of code that I put in each trigger:
IF (##ROWCOUNT = 0) return
declare #AuditType_ID int ,
#AuditDate datetime ,
#AuditUserName varchar(128),
#AuditBitMask varbinary(10)
select #AuditDate = getdate() ,
#AuditUserNAme = system_user,
#AuditBitMask = columns_updated()
-- Determine modification type
IF (exists (select 1 from inserted) and exists (select 1 from deleted))
select #AuditType_ID = 2 -- UPDATE
ELSE IF (exists (select * from inserted))
select #AuditType_ID = 1 -- INSERT
ELSE
select #AuditType_ID = 3 -- DELETE
(record this data to your table of choice)
I have a special function that can decode the bitmask values, but for some reason it is not pasting well here. Message me and I'll email it to you.

How to send email from execute sql task?

How can i send email if execute sql task get executed and loads a table. so, if table is loaded with any record send email, if not loaded no email.
Appreciate any help.
after that task add another task that checks if there are any rows in the table
something like
IF EXISTS (SELECT 1 FROM Yourtable)
BEGIN
EXEC msdb.dbo.sp_send_dbmail ---------
END
read up on sp_send_dbmail here: http://msdn.microsoft.com/en-us/library/ms190307.aspx
its cool i solved it. i used two execute sql tasks and first one for loading data into the table, second one counting the records and i put variable on green arrow #MyVariable > 0 and connected the send mail task.
Thanks to all.

Resources