Getting Rid of Unwanted Text in SQL Server 2005 Database - sql-server

I hope someone can help. My database was infected by a virus which inserted a string of characters into virtually every text and varchar field.
Before the virus, data was stored like "San Diego Football Team".
After the virus, data reads like "San Diego Football Team - plus a string of HTML characters, style sheet content, url's, etc."
I can't include what the tags are because they will not be rendered in this message.
To complicate matters, the additional information is random strings of HTML tags (w/data) so it's difficult to identify patterns. Luckily, all of the inserted text starts with the title tag (w/brackets). Any suggestions for to remove this text easily? We have roughly 500k records.
Many thanks for your help.
Jim

Try this to get rid of appended text
update #table set
firstname = SUBSTRING(firstname, 0, CHARINDEX('<title', firstname, 0))
Where
CHARINDEX('<title', firstname, 0) > 0
This is assuming the text is at the end of actual original text. Try this with a column first and extend it to other columns.
Use this with caution!

Related

How to get rows written in Cyrillic / Russian in SQL Server

I have a SQL Server table which stores all the data across the world. But i want to retrieve only russian characters from table specifically.
I tried below query but this is returning all NON ENGLISH data.
select * from tablename where column like '%[^-A-Za-z0-9 /.+$]%'
Is there a way to get only russian characters.
Thanks in advance.
I would suggest you to analyze one random (first for example) character from the string, if its code lays between first letter in the alphabet and the last.
For example like this:
select *
from tablename
where unicode(substring(column, 1, 1)) between unicode('А') and unicode('я')
and of course using this approach you do not get "all Russian characters", but you will be able to get all rows, where text is written in Russian. I suppose that is what you're really asking for :)

Detecting split character in string

Recently I had a Problem with Splitting a HTML string into separate columns in Access 2010 and store date in SQL server 2008R2 table. This works more or less because the number of lines varies.
To avoid storing the HTML-tags I have a easier solution, I use a text field in Access which is Formated as Standard while the other text-field is formated as rich-text to contain the Format.
Inserting the same content, which is copied from a HTML mail in the Standard text field maintains the LINE FEED !
How can I "detect" which character is it to replace it by another character for further use??
Example:
<div><font face=Arial size=3 color=black>Customer Name</font></div>
<div><font face=Arial size=3 color=black>Customer Address</font></div>
<div><font face=Arial size=3 color=black>12345 City</font></div>
is the copied Content from html mail and inserted into the rich-text field.
If I copy this Content to a Standard text field it Looks like this:
Customer Name
Customer Address
12345 City
In the Server table the Content of the field Looks like this
Customer Name Customer Address 12345 City
But to maintain the line feed in the Access field there must be any character, but which??
Thanks!
Michael
In the SQL-Server table, multiple rows are shown as one row. If you set the ControlSource property of the Access TextBox to that column, it will be shown with multiple lines.
If you still want to know the characters: It's a combination of the ASCII control characters CR and LF (chr(13) & chr(10) or vbCrLf or vbNewLine). In SQL-Server syntax it's CHAR(13)+CHAR(10).

Word wrap issues with SSIS Flat file destination

Background: I need to generate a text file with 5 records each of 1565 character length. This text file is further used to feed the data to a software.
Hence, they are some required fields and optional fields. I created a query with all the fields added together to get one single field. I populated optional fields with a blank.
For example:
Here is the sample input layout for each fields
Field CharLength Required
ID 7 Yes
Name 15 Yes
Address 15 No
DOB 10 Yes
Age 1 No
Information 200 No
IDNumber 13 Yes
and then i generated a query for each unique ID with the above fields into a single row which looks like following:
> SELECT Cast(1 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(' AS CHAR(200))+CAST('123456' AS CHAR(13))
UNION
SELECT Cast(2 AS CHAR(7))+CAST('XYZ' AS CHAR(15))+CAST('' AS CHAR(15))+CAST('22/12/2014' AS
CHAR(10))+CAST('' AS CHAR(1))+CAST(''AS CHAR(200))+CAST('123456' AS CHAR(13))
Then, I created an SSIS package to produce the output text file through Flat file destination delimited.
Problem:
Even though the flat file is generated as per the desired length(1565). The text file looks differently when the word wrap is ON or OFF. When Word wrap is off , i get the record in single line. If the Word wrap is on, the line is broken into multiple. the length of the record in either case is same.
Even i tried to use VARCHAR + Space in the query instead of CHAR for each field, but there is no success. Its breaking the line for blank fields.
For example: Cast('' as varchar(1)) + Space(200-len(Cast('' as varchar(1)))) for Information field
Question: How do make it into a single line even though the word wrap is ON.
Since its my first post, please excuse me for format of the question
The purpose of word wrap is to put characters on the next line in instances of overflow rather than creating an extremely horizontal scrolling document.
Word wrap is the additional feature of most text editors, word processors, and web browsers, of breaking lines between words rather than within words, when possible.
Because this is what word wrap is there's nothing you can do to change its behavior. What does it matter anyway? The document should still be parsed as you would expect. Just don't turn word wrap on.
As far as I'm aware, having word wrap on or off has no impact on the document itself, it's simply a presentation option.
Applications parsing a document parse it as if word wrap were off. Something that could throw off parsing is breaks for a new line, but that is a completely different thing from word wrap.

MS SQL : Full Text Search Results are not relevant

I am trying MS SQL Full Text Query on single column.
For this I am using "FREETEXTABLE" function.
When I query "Horse ride" the result set contains videos where title contain the word "ride".
No wonder that when using FREE or "FREETEXTTABLE" the process is to break query string
into words, create inflectional words and that is how the result set get generated.
So my question is if this is the process, why the result set have no video where the "horse" word is
present (I have videos in DB where videos title contains the "horse" word).
Is it because the word breaker gives preference to "verbs" ?
Please comment on how "word breaker" and "stemmer" works for English language.
Links where I could find grate details about "word breaker" and "stemmer" will also be
help full.
This is very important for me to get relevant results every time.
Thank you.
Full text search filters the noise words and punctuations and you have the flexibility of adding more noise words to the default list of noise words. But to manipulate verbs, inflectional or synonyms we can make use of different functions in where clause.
In your case if you are looking for fields where the word "Horse" AND "ride" exists you can simply make use of Contains function, something like this....
SELECT ColumnName
FROM TableName
WHERE Contains(ColumnName, '"horse" AND "ride"')
If you are looking for values where there is word "Horse" and any inflectional form of "ride" say like ride, riding. You can use something like this ....
SELECT ColumnName
FROM TableName
WHERE Contains(ColumnName, '"horse"') AND CONTAINS(ColumnName, 'FORMSOF(INFLECTIONAL, ride)')

Hacked SQL Server database need regex

A database that a client of mine has was hacked. I am in the process of trying to rebuild the data. The site is running classic ASP with a SQL Server database. I believe I have found where the weak point was for the hackers and removed that entry point for now.
Every text colummn in the database was appended with some html markup and inline script/js tags.
Here is an example of a field:
all</title><script>
document.write("<style>.aq21{position:absolute;clip:rect(436px,auto,auto,436px);}</style>");
</script>
<div class=aq21>
<a href=http://samedaypaydayloansonlineelqmt.com >same day payday loans online</a>
<a href=http://samedaypaydayloan
This example was in the Users table in the UserRights column. The initial value was all, but then you can see the links that were appended.
I need to write a regex script that will search through all fields in each column of each table in the database and remove this extra markup.
Essentially, if I try to match </table>, then that string and everything that appends it can be replaced with a blank string.
All of these appended strings are the same for each field in the same column. However, there are multiple columns in each table.
This is what I have been doing so far, replacing the hacked part, but a nice regex would probably help me out, though my regex skills.... well suck.
UPDATE [databasename.[db].[databasetable]
set
UserRights = replace(UserRights,'</title><script>document.write("<style>.aq21{position:absolute;clip:rect(436px,auto,auto,436px);}</style>");</script><div class=aq21><a href=http://samedaypaydayloansonlineelqmt.com >same day payday loans online</a><a href=http://samedaypaydayloan','');
Any regex help and/or tips are appreciated.
This is what I ended up doing (big thanks to #Bohemian):
I went through each table and checked which column was affected. Then I ran the following script on each column:
UPDATE [tablename]
set columnname = substring(columnname, 1, charindex('/', columnname)-1)
where columnname like '%</%';
If the column had any markup in it, then I ended up manually updating those records manually. (lucky for me there was only a couple of records).
If anyone has any better solutions, please feel free to comment.
Thanks!
Since the bad stuff starts with a <, and that is an unusual character to typically find, I would use normal text functions, something like this:
update mytable set
mycol = substr(mycol, 1, charindex('<', mycol) - 1)
where mycol like '%<%';
And methodically do this with every column of every table.
Note that I'm only guessing at the right function to use, since I'm unfamiliar with SQL Server, but you get idea.
I welcome someone editing the SQL to improve it.

Resources