Function That Detects if Tweets are similar - arrays

I am trying to create a function in VBA that takes 2 strings and a threshold(a percentage in decimal form), and returns true if the strings contain a higher percentage of the same words than the threshold.
Here is the code that I have so far...
Function isDup(tweet1 As String, tweet2 As String, threshold As Double) As Boolean
'Declare variables to store words from each tweet
Dim C1 As String
Dim C2 As String
'Use split function to split each tweet into single words. The " " is the delimiter, each space creates a new word
C1 = Split(tweet1, " ")
C2 = Split(tweet2, " ")
'Loop through each word from tweet1 and each word from tweet2
For i = LBound(C1) To UBound(C1)
For j = LBound(C2) To UBound(C2)
'Declare variable to store result from StrComp Function
Dim Cresult As Double
'Use StrComp Function to compare the current word from tweet1 to the current word from tweet2
Cresult = StrComp(i, j, vbTextCompare)
Next i
Next j
'Use If Then to return true if the tweets are more similar than the percentage given by the threshold
If Cresult > threshold Then
isDup = True
End Function
I am pretty new to VBA so there are some errors, specifically I keep running into the Expected: Array Error. Any help would be greatly appreciated, thank you!

Here's a quick rewrite fixing up the things I noted in my comments above. If this isn't exactly what you were after, it should get you in the ballpark.
Function isDup(tweet1 As String, tweet2 As String, threshold As Double) As Boolean
'Declare variables to store words from each tweet
Dim C1 As Variant
Dim C2 As Variant
'Use split function to split each tweet into single words. The " " is the delimiter, each space creates a new word
C1 = Split(tweet1, " ")
C2 = Split(tweet2, " ")
'Declare variable to store result from StrComp Function
Dim Cresult As Double
'Loop through each word from tweet1 and each word from tweet2
For i = LBound(C1) To UBound(C1)
For j = LBound(C2) To UBound(C2)
'Use StrComp Function to compare the current word from tweet1 to the current word from tweet2
If StrComp(C1(i), C2(j), vbTextCompare) = 0 Then
Cresult = Cresult + 1
End If
Next j
Next i
'Use If Then to return true if the tweets are more similar than the percentage given by the threshold
If Cresult > threshold Then
isDup = True
End If
End Function

Related

Deleting instances of chars using arrays and loops

I have a column where almost every cell is made of a combination of numbers and letters and symbols ("TS-403" or "TSM-7600"). I want every char that's not an integer to be deleted/replaced with an empty string, so that I'm left only with numbers ("403").
I've thought up of two approaches:
I think the best one is to create an array of integers with the numbers 0-9, and then iterate through the cells with a for loop where if the string in a cell contains a char that's not in the array, then that symbol (not the entire cell) should be erased.
Sub fixRequestNmrs()
Dim intArr() as Integer
ReDim intArr(1 to 10)
For i = 0 to 9
intArr(i) = i
Next i
Dim bRange as Range
Set bRange = Sheets(1).Columns(2)
For Each cell in bRange.Cells
if cell.Value
// if cell includes char that is not in the intArr,
// then that char should be deleted/replaced.
...
End Sub()
Perhaps the second approach is easier, which would be to use the Split() function as the '-' is always followed by the numbers, and then have that first substring replaced with "". I'm very confused on how to use the Split() function in combination with a range and a replace funtion though...
For Each cell in bRange.Cells
Cells.Split(?, "-")
...
Digits to Integer Using the Like Operator
The Function
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' Purpose: Returns an integer composed from the digits of a string.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function DigitsToInteger(ByVal SearchString As String) As Long
Dim ResultString As String
Dim Char As String
Dim n As Long
For n = 1 To Len(SearchString)
Char = Mid(SearchString, n, 1)
If Char Like "[0-9]" Then ResultString = ResultString & Char
Next n
If Len(ResultString) = 0 Then Exit Function
DigitsToInteger = CLng(ResultString)
End Function
A Worksheet Example
Sub DigitsToIntegerTEST()
Const FIRST_ROW As Long = 2
' Read: Reference the (single-column) range.
Dim wb As Workbook: Set wb = ThisWorkbook ' workbook containing this code
Dim ws As Worksheet: Set ws = wb.Worksheets("Sheet1")
Dim LastRow As Long: LastRow = ws.Cells(ws.Rows.Count, "B").End(xlUp).Row
If LastRow < FIRST_ROW Then Exit Sub ' no data
Dim rg As Range: Set rg = ws.Range("B2", ws.Cells(LastRow, "B"))
Dim rCount As Long: rCount = rg.Rows.Count
' Read: Return the values from the range in an array.
Dim Data() As Variant
If rCount = 1 Then
ReDim Data(1 To 1, 1 To 1): Data(1, 1) = rg.Value
Else
Data = rg.Value
End If
' Modify: Use the function to replace the values with integers.
Dim r As Long
For r = 1 To rCount
Data(r, 1) = DigitsToInteger(CStr(Data(r, 1)))
Next r
' Write: Return the modifed values in the range.
rg.Value = Data
' To test the results in the column adjacent to the right, instead use:
'rg.Offset(, 1).Value = Data
End Sub
In VBA (Simple)
Sub DigitsToIntegerSimpleTest()
Const S As String = "TSM-7600sdf"
Debug.Print DigitsToInteger(S) ' Result 7600
End Sub
In Excel
=DigitsToInteger(A1)
If you have the CONCAT function, you can do this with a relatively simple formula -- no VBA needed:
=CONCAT(IFERROR(--MID(A1,SEQUENCE(LEN(A1)),1),""))
If you prefer a non-VBA solution in an earlier version of Excel, there is a more complex formula available, but I'd have to go back through my files to locate it.
A tricky function GetVal()
The following function
translates a string into a single characters array arr via help function String2Arr()
isolates them into numeric (category code 6) or non-numeric categories (other) via a tricky execution of Application.Match (here without its 3rd argument which is mostly used for precise search, and by comparing two arrays)
finds the starting position in the original string via Instr()
returns the value of the right substring via Val() (~> see note).
Function GetVal(ByVal s As String) As Double
Dim arr: arr = String2Arr(s): Debug.Print Join(arr, "|")
Dim chars: chars = Split(" ,',+,-,.,0,A", ",")
Dim catCodes: catCodes = Application.Match(arr, chars) 'No 3rd zero-argument!!
Dim tmp$: tmp = Join(catCodes, ""): Debug.Print Join(catCodes, "|")
Dim pos&: pos = InStr(tmp, 6) ' Pos 6: Digits; pos 1-5,7: other symbols/chars
GetVal = Val(Mid(s, pos)) ' calculate value of right substring
End Function
Notes
The Val function can translate a (sub)string with starting digits to a number, even if there are following non-numeric characters.
Help function String2Arr()
Atomizes a string into a single characters array:
Function String2Arr(ByVal s As String)
s = StrConv(s, vbUnicode)
String2Arr = Split(s, vbNullChar, Len(s) \ 2)
End Function
Example call
Dim s As String
s = "aA+*&$%(y#,'/\)!-12034.56blabla"
Debug.Print GetVal(s) ' ~~> 12034.56

Count the number of uppercase words in a text review

I want to count the number of uppercase words in a data set of movie reviews (A2:A1001) and paste the results to column E.
I am having trouble getting any response from the sub on the spreadsheet.
Sub UppercaseWordCount()
Dim ArraySplit() As String
Dim X As Integer
Dim Count As Integer
Dim NextWord As String
Dim Line As Integer
Count = 0
ArraySplit = Split("A2:A1001", " ")
Line = 2
'splitting each review into an array and using lower
'and upper bounds with Ucase function to get a count
'len function used to split up words from single characters
For X = LBound(ArraySplit) To UBound(ArraySplit)
NextWord = ArraySplit(X)
If NextWord = UCase(NextWord) And Len(NextWord) >= 2 Then
Count = Count + 1
End If
Next
'calling the sub to column E with a count for each review
Range("E" & Line).Value = Count
Line = Line + 1
End Sub
Try this. You could convert to a custom function if doing repeatedly.
Sub UppercaseWordCount()
Dim ArraySplit() As String
Dim X As Long 'long better than integer
Dim Count As Long
Dim NextWord As String
Dim r As Range
'splitting each review into an array and using lower
'and upper bounds with Ucase function to get a count
'len function used to split up words from single characters
For Each r In Range("A2:A1001") 'loop through defined range
ArraySplit = Split(r, " ") 'split each cell using space delimiter
For X = LBound(ArraySplit) To UBound(ArraySplit) 'everything else as before
NextWord = ArraySplit(X)
If NextWord = UCase(NextWord) And Len(NextWord) >= 2 Then
Count = Count + 1
End If
Next
'calling the sub to column E with a count for each review
r.Offset(, 4).Value = Count 'can use offset rather than defining a new variable
Count = 0 'reset count
Next r
End Sub

Compare strings to identify duplicates

I have to write an isDup function to compare two tweets based on their similar word counts to determine if the tweets are duplicate, based on a decimal threshold chosen (0-1).
My process is to write a sub with two hardcoded tweets my prof has provided (just to get an understanding before converting to a function). I encountered a run time error 5.
Option Explicit
Sub isDup()
Dim tweet1 As String
Dim tweet2 As String
Dim threshold As Double
threshold = 0.7
tweet1 = "Hours of planning can save weeks of coding"
tweet2 = "Weeks of programming can save you hours of planning"
Dim tweet1Split() As String
tweet1Split = Split(tweet1, " ")
Dim tweet2Split() As String
tweet2Split = Split(tweet2, " ")
Dim i As Integer
Dim j As Integer
Dim sameCount As Integer
'my thought process below was to compare strings i and j to see if equal, and if true add 1 to sameCount,
'but the If StrComp line is where the error is
For i = LBound(tweet1Split) To UBound(tweet1Split) Step 1
For j = LBound(tweet2Split) To UBound(tweet2Split) Step 1
If StrComp(i, j, vbDatabaseCompare) = 0 Then
sameCount = sameCount + 1
Exit For
End If
Next j
Next i
End Sub
'here i wanted to get a total count of the first tweet to compare, the duplicate tweet is true based on the number of
'similar words
Function totalWords(tweet1 As String) As Integer
totalWords = 0
Dim stringLength As Integer
Dim currentCharacter As Integer
stringLength = Len(tweet1)
For currentCharacter = 1 To stringLength
If (Mid(tweet1, currentCharacter, 1)) = " " Then
totalWords = totalWords + 1
End If
Next currentCharacter
End Function
'this is where i compute an "isDup score" based on similar words compared to total words in tweet1, in this
'example the threshold was stated above at 0.7
Dim score As Double
score = sameCount / totalWords
If score > threshold Then
MsgBox "isDup Score: " & score & " ...This is a duplicate"
Else
MsgBox "isDup Score: " & score & " ...This is not a duplicate"
End If
End Sub
First issue:
i and j are just indexes. You want to compare the string that your index relates to so:
If StrComp(tweet1Split(i), tweet2Split(j), vbDatabaseCompare) = 0 Then
Second issue:
As noted in Microsoft documentation for StrComp, vbDatabaseCompare is reserved for Access, which you are not using, hence the source of your second error. You need to switch to a different comparison

Remove characters from a word - VBA

I want to remove characters from a string based on the position. i use the below code and i manage to:
Insert all word letters in an array
Loop the array and replace the letters that i want with nothing
BUT i dont know have to remove nothing from the array, recreate array index and create the word again without those characters.
'Create an array with all letters
For j = 1 To Len(SheetName2)
ReDim Preserve arrWord(k)
arrWord(k) = Mid(SheetName2, j, 1)
k = k + 1
Next j
Counter = (Len(SheetName2) - 31) * 2
'Loop array and replace with nothing
For w = Counter - 1 To 0 Step -2
arrWord(w) = ""
Next w
You don't need an array of the characters. Just read every second character into a new string.
Public Sub OutputEverySecondChar()
Dim TestString As String
TestString = "abcdefghij"
Dim OutputString As String
Dim iChar As Long
For iChar = 1 To Len(TestString) Step 2
OutputString = OutputString & Mid$(TestString, iChar, 1)
Next iChar
Debug.Print OutputString 'output is: acegi
End Sub
Or even better as re-useable function:
Public Sub test()
Debug.Print ReturnEveryXcharcter(InputString:="abcdefghij", FirstChar:=1, xChar:=2, CharLength:=1)
End Sub
Public Function ReturnEveryXcharcter(InputString As String, FirstChar As Long, xChar As Long, CharLength As Long) As String
Dim iChar As Long
For iChar = FirstChar To Len(InputString) Step xChar
ReturnEveryXcharcter = ReturnEveryXcharcter & Mid$(InputString, iChar, CharLength)
Next iChar
End Function
So the function is even more flexible and you can eg use the following to delete every 3ʳᵈ character (beginning with the first):
Debug.Print ReturnEveryXcharcter(InputString:="abcdefghij", FirstChar:=2, xChar:=3, CharLength:=2)
'output will be: bcefhi
Edit according comment:
Then loop until it is less then 31 characters:
Public Sub test()
Dim SheetName As String
SheetName = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
Do
SheetName = ReturnEveryXcharcter(InputString:=SheetName, FirstChar:=1, xChar:=2, CharLength:=1, LengthLimit:=31)
Loop While Len(SheetName) > 31
Debug.Print SheetName
End Sub
My solution:
If Len(SheetName1) > 31 Then
SheetName2 = Replace(SheetName1, " ", "")
'Third check - Remove letter by letter starting from the beginning
If Len(SheetName2) > 31 Then
Counter = (Len(SheetName2) - 31) * 2
Part_1 = Left(SheetName2, Counter)
Part_2 = Mid(SheetName2, Counter + 1, Len(SheetName2) - (Counter))
Semi_Part = ""
For j = 1 To Counter Step 2
'Use only Part_1 because it ll be the same in all occurance
Semi_Part = Semi_Part + Mid(Part_1, j, 1)
Next j
FinalSheetname = Semi_Part & Part_2
Else
FinalSheetname = SheetName2
End If
Else
FinalSheetname = SheetName1
End If
Just for interest, you can also do this with a worksheet function if you have one of the more recent versions of Excel (O365 or 2016+) that has the CONCAT function.
The "trick" is to create an array that includes the positions of the characters you wish to retain.
In the case of deleting every even space positioned character (retaining the odd positions):
=CONCAT(MID(A1,(ROW(INDEX($A:$A,1,1):INDEX($A:$A,ROUNDUP(LEN(A1)/2,0),1))-1)*2+1,1))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar

Regex expression and match function in text file

I have problem with my matching function actually I have to count number of lines with specific string and return line number ,so I have one dimensional array of string that contain the unique strings of text file {33,7,77,3 23,6} and text file with the same strings in array I have read lines of text file to array , but with duplicate of these strings ,when I use regex.match it works not bad expect when I check if line 2 contain 3 the function return True it's consider 3 in 23 as 3 , and the above explanation is just example of what I need any help please
Module Module1
Sub Main()
Dim txt As String = File.ReadAllText("e:\ii.txt")
' Use regular expressions to replace characters
' that are not letters or numbers with spaces.
Dim reg_exp As New Regex("[^a-zA-Z0-9]")
txt = reg_exp.Replace(txt, " ")
' Split the text into words.
'Dim words() As String = txt.Split( _
' New Char() {" "c}, _
' StringSplitOptions.RemoveEmptyEntries)
Dim words = txt.Split(New String() {" ", Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
' Use LINQ to get the unique words.
Dim word_query = _
(From word As String In words _
Order By word _
Select word).Distinct()
Dim stra() As String = word_query.ToArray()
For i = 0 To stra.Length - 1
Console.WriteLine(" " & stra(i))
Next
Dim lines() As String = IO.File.ReadAllLines("e:\ii.txt")
For i = 0 To lines.Length - 1
Console.WriteLine(lines(1))
Dim linecount = 0
Dim regex As New Regex(stra(i), RegexOptions.ExplicitCapture)
Dim match As Match = regex.Match(lines(1))
If match.Success Then
linecount += 1
Console.WriteLine("linecount= " & linecount)
Else
Console.WriteLine("false")
End If
Next
End Sub
End Module
You many not have to split the text into words. Is your word list very long? From what I understand you want the following:
1.Read a text file and return the line number for a given word or phrase.
Is the word or phrase complex? If not, why not use a the Contains extension method?
For example:
Dim myString = "Hello World"
If myString.Contains("World") Then
'Add line number to line count.
End if
If you are using this as an opportunity to learn regular expressions, I highly recommend "Mastering Regular Expressions" by Jeffrey Friedl. When I first begun I invested in a program RegexBuddy, which is worth the money. But now there are so many online regex testers now, that could be an alternative for something free.
Enhance your regex with anchors. These will ascertain that the whole test string matches instead of a substring. The following code also assembles all match patterns of interest into a single regex pattern which will be used against each line of the target file:
Dim strall As String
strall = ""
For i = 0 To stra.Length - 1
If i > o Then
strall = strall & "|"
End If
strall = strall & stra(i)
Console.WriteLine(" " & stra(i))
Next
strall = "^(" & strall & ")$"
Dim regexall As New Regex(strall, RegexOptions.ExplicitCapture)
'...
Dim linecount = 0
Dim match As Match = regexall.Match(lines(i)) '... was 'lines(1)', probably a typo
If match.Success Then
'...
this code is working with me thanks for all
Public Function countlines(ByVal st As String) As Integer
Dim count As Integer
Dim linecount As Integer = 0
Dim substrings() As String = Regex.Split(st, " ")
Dim stt() As String = {23, 7, 3}
For i = 0 To stt.Length - 1
'For j = 0 To substrings.Length - 1
'Console.WriteLine(substrings(0))
'For i = 0 To substrings.Length - 1
'Console.Write(substrings(i))
Dim matchQuery = From word In substrings Where word.ToLowerInvariant() = stt(i).ToLowerInvariant() Select word
' ' Count the matches.
count = matchQuery.Count()
Console.WriteLine("count=" & count)
If count > 0 Then
linecount += 1
Else
Console.WriteLine(" linecount=" & linecount)
End If
Next
Console.WriteLine("linecount= " & linecount)
Return linecount
End Function
Sub Main()
Dim lines() As String = IO.File.ReadAllLines("e:\ii.txt")
For Each line As String In lines
countlines(line)
Next
End Sub

Resources