Related
I paste data from elsewhere to a userform.
For example (the copied string)
Clinical: history of heart disease
Labs: elevated cholesterol on 8Aug
Meds: just started cholesterol medication
Supplements: none
Allergies: none
Activity: recently started going to YMCA 3x/wk (elliptical and some weight lifting
I want to paste the above string into textbox1.
The string should then be split into appropriate headings on textboxes 2 to 7 on the same userform.
In textbox 2, I want everything between "Clinical:" and "Labs:"
"history of heart disease" without the headings.
If “Labs:” is not present, I want everything between Clinical: and Meds (or next heading)
At this point, I think a loop to repeat this process but for the next items
(e.g. texbox 3 = everything between Labs: and Meds – or next heading; Textbox4 = everything between Meds: and Supplements) – or next heading; etc.
Private Sub CommandButton1_Click()
Dim strnames(1 To 6) As String
strnames(1) = "Clinical: "
strnames(2) = "Labs: "
strnames(3) = "Meds: "
strnames(4) = "Supps: "
strnames(5) = "Allergies: "
strnames(6) = "Activity: "
strnames(7) = "NFPE: "
Dim check As Integer
str1 = TextBox1
x = 1
For box = 1 To 6
If InStr(TextBox1.Text, strnames(1)) > 0 Then
str2 = SuperMid(str1, strnames(x), strnames(x + 1))
TextBox2 = str2
End If
If InStr(TextBox1.Text, strnames(1)) = 0 Then
TextBox2 = "none"
End If
Next box
End sub
This is the code that I have been using (from wellsr.com) to capture the data between word1 and word2 of the array. The problem occurs when a word in the array is not present at which point it adds all of the text following the first word.
Public Function SuperMid(ByVal strMain As String, str1 As String, str2 As String,
Optional reverse As Boolean) As String
Dim i As Integer, j As Integer, temp As Variant
On Error GoTo errhandler:
If reverse = True Then
i = InStrRev(strMain, str1)
j = InStrRev(strMain, str2)
If Abs(j - i) < Len(str1) Then j = InStrRev(strMain, str2, i)
If i = j Then 'try to search 2nd half of string for unique match
j = InStrRev(strMain, str2, i - 1)
End If
End If
If reverse = False Then
i = InStr(1, strMain, str1)
j = InStr(1, strMain, str2)
If Abs(j - i) < Len(str1) Then j = InStr(i + Len(str1), strMain, str2)
If i = j Then 'try to search 2nd half of string for unique match
j = InStr(i + 1, strMain, str2)
End If
End If
If i = 0 And j = 0 Then GoTo errhandler:
If j = 0 Then j = Len(strMain) + Len(str2) 'just to make it arbitrarily large
If i = 0 Then i = Len(strMain) + Len(str1) 'just to make it arbitrarily large
If i > j And j <> 0 Then 'swap order
temp = j
j = i
i = temp
temp = str2
str2 = str1
str1 = temp
End If
i = i + Len(str1)
SuperMid = Mid(strMain, i, j - i)
Exit Function
errhandler:
MsgBox "Error extracting strings. Check your input" & vbNewLine & vbNewLine & "Aborting", , "Strings not found"
End
End Function
Sometimes you need to add a little complication to make things easier. The code below may be of interest.
Option Explicit
' This code requires a reference to the Microsoft Scripting Runtime
Public Sub Test()
Dim myHistory As Scripting.Dictionary
Set myHistory = GetHistoryDictionary("Clinical: history of heart disease Labs: elevated cholesterol on 8AugMeds: just started cholesterol medication Supplements: none Allergies: none Activity: recently started going to YMCA 3x/wk (elliptical and some weight lifting)")
Debug.Print VBA.Join(myHistory.keys, vbCrLf)
Debug.Print VBA.Join(myHistory.Items, vbCrLf)
Debug.Print
If myHistory.Exists("Labs") Then
Debug.Print "The Lab report was: " & myHistory.Item("Labs")
End If
Debug.Print
If myHistory.Exists("Heamatology") Then
Debug.Print "The Heamatolofy report was: " & myHistory.Item("Heamatology")
Else
Debug.Print "The Heamtology report was: " & "Not Present"
End If
End Sub
Public Function GetHistoryDictionary(ByVal ipString As String) As Scripting.Dictionary
' Create an array of the labes in the input strings
Static myLabels As Variant
If VBA.IsEmpty(myLabels) Then
myLabels = Split("Clinical:,Labs:,Meds:,Supps:,Allergies:,Activity:,NFPE:", ",")
End If
' Add a character we can use as a separator with SPlit
Dim myLabel As Variant
For Each myLabel In myLabels
ipString = VBA.Replace(ipString, myLabel, "#" & myLabel)
Next
' remove characters until we have removed the first separator character
Do Until VBA.Left(ipString, 1) = "#"
ipString = VBA.Mid$(ipString, 2)
Loop
ipString = VBA.Mid$(ipString, 2)
'Get an array of Label/Message
Dim myItems As Variant
myItems = VBA.Split(ipString, "#")
'Split the label/message and put into a scripting.dictionary
Dim myHistory As Scripting.Dictionary
Set myHistory = New Scripting.Dictionary
Dim myItem As Variant
For Each myItem In myItems
Dim mySPlit As Variant
mySPlit = VBA.Split(myItem, ":")
myHistory.Add mySPlit(0), mySPlit(1)
Next
Set GetHistoryDictionary = myHistory
End Function
Building on your code:
First ensure you have Option Explicit at the top of all your modules as this will help pick out any simple errors.
In your UserForm you could have text boxes labeled TextBox1, TextBox2 etc. Then you could use this for the command button code:
Private Sub CommandButton1_Click()
Dim strnames(1 To 7) As String
strnames(1) = "Clinical: "
strnames(2) = "Labs: "
strnames(3) = "Meds: "
strnames(4) = "Supps: "
strnames(5) = "Allergies: "
strnames(6) = "Activity: "
strnames(7) = "NFPE: "
Dim str1 As String
str1 = TextBox1.Text
' It makes the code clearer if you are explicit about what you want
' from your text box - .Text (or .Value), even if VBA will
' give you its value if you don't specify it.
Dim str2 As String
Dim ctlControl As Control
Dim lngTextBoxNumber As Long
' You need to loop through all the controls on the form, and then
' determine which are the ones you want to alter. This assumes each
' textbox you are interested in is named in the form
' TextBox1, TextBox2 etc. To make code maintenance easier, I would
' probably put this kind of identification information on the
' controls' tag properties - that way if you rename the controls or
' you add a text box which is for something else, you won't break
' the code. You would then be reading this information off the
' .Tag property rather than .Name.
For Each ctlControl In Me.Controls
If Mid$(ctlControl.Name, 1, 7) = "TextBox" Then
lngTextBoxNumber = CLng(Mid$(ctlControl.Name, 8))
If lngTextBoxNumber > 1 And lngTextBoxNumber < UBound(strnames) Then
str2 = SuperMid(str1, strnames(lngTextBoxNumber), strnames(lngTextBoxNumber + 1))
If str2 = vbNullString Then
str2 = "none"
End If
ctlControl.Text = str2
End If
End If
Next ctlControl
End Sub
SuperMid seems to be quite an unforgiving function - as you have it, if it can't find the text before and after the text you are looking for, it will fail with an error: it might be better for it to return an empty string - otherwise your code will fail not all the strnames are present in your original string.
I altered the end of that function to look like this:
Exit Function
errhandler:
'MsgBox "Error extracting strings. Check your input" & vbNewLine & vbNewLine & "Aborting", , "Strings not found"
SuperMid = vbNullString
End Function
As it stands, your code would fail to pick up some of the information if items are left out, or had been entered in a different order: see freeflow's answer to avoid this.
I would skip the array because what you're really looking to do is to extract the phrase following the keyword. The example below shows how you can use a function to isolate the phrase.
Function ExtractByKeyword(ByVal source As String, _
ByVal keyword As String) As String
'--- extracts a phrase (substring) from the given source,
' beginning with the keyword and ending with the next
' (unknown) keyword.
' Keywords are delimited by a preceding space ' ' and
' followed by a colon ":" or EOL
Dim pos1 As Long
pos1 = InStr(1, source, keyword, vbTextCompare)
If pos1 = 0 Then
'--- the keyword was not found, so return a null string
ExtractByKeyword = vbNullString
Exit Function
End If
Dim phrase As String
'--- skip over the keyword and find the next keyword
' (i.e. look for the next colon)
Dim pos2 As Long
pos2 = InStr(pos1 + Len(keyword) + 1, source, ":", vbTextCompare)
If pos2 = 0 Then
'--- this is the last keyword and phrase in the source
phrase = Right$(source, Len(source) - pos1 - Len(keyword) - 1)
Else
'--- now work backwards from the second keyword to find the
' end of the phrase (which is the space just before the
' second keyword
Dim pos3 As Long
pos3 = InStrRev(source, " ", pos2, vbTextCompare)
Dim startsAt As Long
Dim phraseLen As Long
startsAt = pos1 + Len(keyword) + 2
phraseLen = pos3 - startsAt
phrase = Mid$(source, startsAt, phraseLen)
End If
ExtractByKeyword = phrase
End Function
I used the test routine below to check the extraction:
Option Explicit
Sub test()
Const medInfo As String = "Clinical: history of heart disease" & _
" Labs: elevated cholesterol on 8Aug" & _
" Meds: just started cholesterol medication" & _
" Supplements: none" & _
" Allergies: none" & _
" Activity: recently started going to YMCA 3x/wk (elliptical and some weight lifting"
Dim phrase As String
phrase = ExtractByKeyword(medInfo, "Labs")
If phrase <> vbNullString Then
Debug.Print " Labs -> '" & phrase & "'"
Else
Debug.Print "Keyword not found!"
End If
phrase = ExtractByKeyword(medInfo, "Clinical")
If phrase <> vbNullString Then
Debug.Print " Clinical -> '" & phrase & "'"
Else
Debug.Print "Keyword not found!"
End If
phrase = ExtractByKeyword(medInfo, "Activity")
If phrase <> vbNullString Then
Debug.Print " Activity -> '" & phrase & "'"
Else
Debug.Print "Keyword not found!"
End If
phrase = ExtractByKeyword(medInfo, "Meds")
If phrase <> vbNullString Then
Debug.Print " Meds -> '" & phrase & "'"
Else
Debug.Print "Keyword not found!"
End If
phrase = ExtractByKeyword(medInfo, "Allergies")
If phrase <> vbNullString Then
Debug.Print "Allergies -> '" & phrase & "'"
Else
Debug.Print "Keyword not found!"
End If
End Sub
How can I build an array if I have multiple delimiters, where some delimiters are single character and others are multiple characters?
Sub Example()
Dim exString As String
Dim myString() As String
exString = "A,B C;D > E"
myString() = Split(exString, "," & " " & ";" & " > ")
End Sub
The result I want in my array is:
myString(0) is A
myString(1) is B
myString(2) is C
myString(3) is D
myString(4) is E
But using Split() in this way doesn't work. I do know that I can use Replace() to replace every single delimiter with a common one, but I have a lot of different delimiters and variations of multiple character delimiters. Using Replace() isn't desirable to me. What can I do?
You can have lots of problems in VBA as well:
'Add a reference to Microsoft VBScript Regular Expressions 5.5 (Tools -> References...)
Dim exString As String
exString = "A,B C;D > E"
Dim re As New RegExp
re.Pattern = "(,| |;|>)+"
re.Global = True
Dim myString() As String
myString = Split(re.Replace("A,B C;D > E", ","), ",")
Setting re.Pattern defines what to look for. | represents finding A or B, so the regular expression will match on , or or ; or >.
Multiple instances should be treated as one (e.g. between the D and the E there are three characters, but there should be only one split), so add a + at the end (and wrap everything else in ()).
Replace then replaces any of the matched patterns with , and gives back a string like this:
A,B,C,D,E
on which we can simply call Split to get back the array.
Reference: VBScript Regular Expressions
Instead of using regular expressions to match the delimiter characters, you could use regexes to match the non-delimiter characters:
Dim re As New RegExp
re.Pattern = "[^, ;>]+" 'The ^ unmatches any characters within the []
re.Global = True
Dim match As Match
For Each match In re.Execute(exString)
'do something with each result here
Debug.Print match.Value
Next
This is sufficient if all you need is to iterate over the results and do something with them. If you specifically need an array with the results:
Dim re As New RegExp
re.Pattern = "[^, ;>]+"
re.Global = True
Dim matches As MatchCollection
Set matches = re.Execute(exString)
ReDim myString(matches.Count) As String
Dim i As Integer
For i = 0 To matches.Count - 1
myString(i) = matches(i).Value
Next
You were on the right track with your function. Using a ParamArray you can easily change the number and position of your delimiters.
Code
Function MultiSplit(SourceText As String, ParamArray Delimiters()) As String()
Dim v As Variant
For Each v In Delimiters
SourceText = Replace(SourceText, v, "•")
Next
MultiSplit = Split(SourceText, "•")
End Function
Test
Sub Test()
Const example As String = "A,B C;D > E"
Dim a1, a2, a3, Incorrect
Incorrect = MultiSplit(example, " ", " > ")
a1 = MultiSplit(example, " > ", ",", ";", " ")
a2 = MultiSplit(example, " > ", ",")
a3 = MultiSplit(example, " > ")
End Sub
Result
NOTE: When using multi-character delimiters, the order that the delimiters are processed matters. Notice that A1 is split proper but Incorrect is not split as intended because the space delimiter came before " > ".
In this situation, I found the following function to be perfect for my needs:
Function MultiSplit(SourceText As String, Optional SingleCharDelimiter As String, Optional MultiCharDelimiter As String, _
Optional Separator As String) As String()
'Created by Tyeler for use by all.
'SourceText is your input string.
'SingleCharDelimiter is a string of desired delimiters.
'SingleCharDelimiter format is a string fully concatenated with no character separation.
' (ex. "-.;:, " MultiSplit will use those 6 characters as delimiters)
'SingleCharDelimiter's will remove blanks from the array in the event two single delimiters
' are next to each other.
'MultiCharDelimiter is a string of specific multi-character delimiters.
'MultiCharDelimiters can be separated by the optional Separator
'Separator is an optional value used to separate multiple MultiCharDelimiters.
' (ex. MultiCharDelimiter = "A A,B B,C C" // Separator = "," // This will make the function
' delimit a string by "A A", "B B", and "C C")
'MultiSplit will make an array based on any delimiter (Including delimiters with
' multiple characters).
If MultiCharDelimiter = "" And SingleCharDelimiter = "" Then Exit Function
Dim i As Integer, n As Integer, dlimit
Dim delColl As New Collection
Dim newString As String: newString = SourceText
Dim delArr() As String, strgArr() As String, delFull() As String
Dim delSep As String, a As Integer: a = 33
Do While InStr(SingleCharDelimiter, Chr(a)) <> 0 Or InStr(MultiCharDelimiter, Chr(a)) <> 0 _
Or InStr(Separator, Chr(a)) <> 0 Or InStr(SourceString, Chr(a)) <> 0 'Find intermediate delimiter
a = a + 1
Loop
delSep = Chr(a)
If MultiCharDelimiter <> "" Then
If Separator <> "" Then 'If there's no delimiter for the delimiter array, assume MultiCharDelimiter is the delimiter
delArr() = Split(MultiCharDelimiter, Separator)
For i = 0 To UBound(delArr)
If InStr(newString, delArr(i)) <> 0 Then newString = Replace(newString, delArr(i), delSep)
Next i
Else
newString = Replace(newString, MultiCharDelimiter, delSep)
End If
End If
Erase delArr
For i = 1 To Len(SingleCharDelimiter) 'Build a collection of user defined delimiters
delColl.Add Mid(SingleCharDelimiter, i, 1)
Next i
For Each dlimit In delColl 'Replace all delimiters in the string with a single common one
newString = Replace(newString, dlimit, delSep)
Next dlimit
strgArr() = Split(newString, delSep)
ReDim delFull(LBound(strgArr) To UBound(strgArr))
n = LBound(strgArr)
For i = LBound(strgArr) To UBound(strgArr) 'Get rid of empty array items
If strgArr(i) <> "" Then
delFull(n) = strgArr(i)
n = n + 1
End If
Next i
n = n - 1
ReDim Preserve delFull(LBound(strgArr) To n)
MultiSplit = delFull 'Send the delimited array
Erase delFull
Erase strgArr
End Function
This function will return an array of values that were separated by user defined delimiters.
To use this function, simply call on it and supply your full string and desired delimiters:
Sub Example1()
Dim exString As String
Dim myString() As String
Dim c, n
exString = "A,B C;D > E"
myString() = MultiSplit(exString, ", ;", " > ")
n = 0
For Each c In myString
Debug.Print "(" & n & ") = " & c
n = n + 1
Next c
End Sub
This will yield the desired result where the array is filled with only ABCDE.
A more complicated example:
Sub Example2()
Dim myString As String, c, n
n = 0
myString = "The,Quickupside-downBrownjelloFox_Jumped[Over] ThegiantLazyjelloDog"
For Each c In MultiSplit(myString, ",_[] ", "upside-down,jello,giant", ",")
Debug.Print "(" & n & ") = " & c
n = n + 1
Next c
End Sub
This will yield the following:
The following is a built-upon version of the code that Thomas Inzina graciously provided.
The following limitations have been removed:
The order that the delimiters are listed in the function.
The temporary delimiter being a set specific character.
The option to include or remove empty array items.
The function changing the reference (ByRef vs ByVal)
Passing an array of delimiters vs listing individual delimiters
Function MultiSplitX(ByVal SourceText As String, RemoveBlankItems As Boolean, ParamArray Delimiters()) As String()
Dim a As Integer, b As Integer, n As Integer
Dim i As Integer: i = 251
Dim u As Variant, v As Variant
Dim tempArr() As String, finalArr() As String, fDelimiters() As String
If InStr(TypeName(Delimiters(0)), "()") <> 0 And LBound(Delimiters) = UBound(Delimiters) Then
ReDim fDelimiters(LBound(Delimiters(0)) To UBound(Delimiters(0))) 'If passing array vs array items then
For a = LBound(Delimiters(0)) To UBound(Delimiters(0)) 'build that array
fDelimiters(a) = Delimiters(0)(a)
Next a
Else
fDelimiters = Delimiters(0)
End If
Do While InStr(SourceText, Chr(i)) <> 0 And i < 251 'Find an unused character
i = i + 1
Loop
If i = 251 Then 'If no unused character in SourceText, use single character delimiter from supplied
For a = LBound(fDelimiters) To UBound(fDelimiters)
If Len(fDelimiters(a)) = 1 Then i = Asc(fDelimiters(a))
Next a
End If
If i = 251 Then 'If no single character delimiters can be used, error.
MsgBox "SourceText uses all character type." & vbCrLf & "Cannot split SourceText into an array.", _
vbCritical, "MultiSplitX Run-Time Error"
Exit Function
End If
Debug.Print i
For a = LBound(fDelimiters) To UBound(fDelimiters) 'Sort Delimiters by length
For b = a + 1 To UBound(fDelimiters)
If Len(fDelimiters(a)) < Len(fDelimiters(b)) Then
u = fDelimiters(b)
fDelimiters(b) = fDelimiters(a)
fDelimiters(a) = u
End If
Next b
Next a
For Each v In fDelimiters 'Replace Delimiters with a common character
SourceText = Replace(SourceText, v, Chr(i))
Next
tempArr() = Split(SourceText, Chr(i)) 'Remove empty array items
If RemoveBlankItems = True Then
ReDim finalArr(LBound(tempArr) To UBound(tempArr))
n = LBound(tempArr)
For i = LBound(tempArr) To UBound(tempArr)
If tempArr(i) <> "" Then
finalArr(n) = tempArr(i)
n = n + 1
End If
Next i
n = n - 1
ReDim Preserve finalArr(LBound(tempArr) To n)
MultiSplitX = finalArr
Else: MultiSplitX = tempArr
End If
End Function
Use of this function doesn't change from how Thomas had it, with the exception that there's an added boolean statement.
Example 1
In this example, RemoveBlankItems has been set to True.
Sub Example1()
Dim myString As String, c, n
n = 0
myString = "The,Quickupside-downBrownjelloFox_Jumped[Over] ThegiantLazyjelloDog"
For Each c In MultiSplitX(myString, True, ",", "-", "upside-down", "jello", " ", "[", "]", "giant", "_")
Debug.Print "(" & n & ") = " & c
n = n + 1
Next c
End Sub
This results in the following output:
Example 2
In this example we have RemoveBlankItems set to False.
Sub Example2()
Dim myString As String, c, n
n = 0
myString = "The,Quickupside-downBrownjelloFox_Jumped[Over] ThegiantLazyjelloDog"
For Each c In MultiSplitX(myString, True, ",", "-", "upside-down", "jello", " ", "[", "]", "giant", "_")
Debug.Print "(" & n & ") = " & c
n = n + 1
Next c
Debug.Print myString
End Sub
This results in the following output:
Example 3
In this example, instead of listing our delimiters in the function, we have them typed out in a string and insert an array in the function instead:
Sub Example3()
Dim myString As String, c, n
Dim myDelimiters As String
n = 0
myString = "The,Quickupside-downBrownjelloFox_Jumped[Over] ThegiantLazyjelloDog"
myDelimiters = ",|-|upside-down|jello| |[|]|giant|_"
For Each c In MultiSplitX(myString, True, Split(myDelimiters, "|"))
Debug.Print "(" & n & ") = " & c
n = n + 1
Next c
Debug.Print myString
End Sub
This has the same result as if they had been listed individually:
The Reason RemoveBlankItems Is Desirable
There are some instances in which you DON'T want to have blanks in your array. An example of this would be if you're using your array as a bank of search words that are cycling through a range on a spread sheet. Another example would be if you're manipulating strings of text based on values in the array.
There are also times when you would want to retain the blanks in the array. As Thomas described, in the event you're using this on a CSV file, where maintaining the spaces as columns is desired. Or you're using it to break apart, for example, HTML coding and wish to retain the line format.
Perhaps:
Sub Example()
Dim exString As String
Dim myString() As String
exString = "A,B C;D > E"
exString = Replace(exString, ",", " ")
exString = Replace(exString, ";", " ")
exString = Replace(exString, ">", " ")
exString = Application.WorksheetFunction.Trim(exString)
myString() = Split(exString, " ")
msg = ""
For Each a In myString
msg = msg & vbCrLf & a
Next a
MsgBox msg
End Sub
why the function below works sometimes and fails at others?
The function idea
a string gets passed to this function by a user or a procedure
the string will be split by multiple delimiters into an array of words
each word is cleaned from non-letter characters except for the characters (dash and single quote mark inside the word) an example of words having these two char. would be (man's spirit, life-loving)
if the single quote marks are found surrounding the word, they get trimmed too
after processing each word in the string, and if the word is not empty it gets stored into the array Along with it's starting position and end position
the start and end positions refer to the position of that word within the input string and excluding any non-letter characters that might surround the word.
Here's the code
Sub test()
Dim d$: d = ThisDocument.Range.Text
Dim Arr(), i&
Arr = ExtractWordsFromDoc_2(d)
For i = 0 To UBound(Arr)
ThisDocument.Range(Arr(i)(1) - 1, Arr(i)(2)).HighlightColorIndex = wdBrightGreen
Next
End Sub
Function ExtractWordsFromDoc_2(ByRef doc As Document, Optional ByVal Delimiters)
' Take a string, and return it as a one dimensional array of individual arrys, each array
' has three values (single delimited string, start range of delimited str, end range of delimited str)
' the input string is delimited by any of several characters. None of those characters are returned in
' the result. Provide a default list of Delimiters, which Should come from registry.
' But allow override.
'===================================================================================================================
Dim InputString$: InputString = doc.Range.Text
'return an array of empty string when input string is empty
If InputString = "" Then
ExtractWordsFromDoc_2 = Array("", 0, 0)
Exit Function
End If
'===================================================================================================================
Dim DelimitList As Variant, ArrayOfWords() As Variant, TmpArr() As Variant
Dim OneChar$, TempWord$, WordCount&, InputStringLength&, CharIndex&, ArrUbound&
'===================================================================================================================
'if delimiters are missing, We should get these from a Registry
If IsMissing(Delimiters) Then
DelimitList = Chr(34) & Chr(147) & Chr(148) & Chr(32) & "," & "." & vbCr & vbTab & "/" & "!" & "|" & ";" & ")" & "(" & "?"
'Chr(34)= straight double quotes mark
'Chr(147) & Chr(148) =opening and closing double quotes marks
'Chr(32) = space
Else
DelimitList = Delimiters 'user can override if needed
End If
'===================================================================================================================
InputStringLength = Len(InputString) 'get the input string length
For CharIndex = 1 To InputStringLength 'loop through each character
OneChar = VBA.Strings.Mid(InputString, CharIndex, 1) 'Read one character at a time
Select Case InStr(DelimitList, OneChar) 'Test if the character is a delimiter character
Case 0 'it is not a delimiter
TempWord = TempWord & OneChar 'Add the character to the current word
Case Is <> 0, Is = InputStringLength 'it is a delimiter or it is the last character
'if the temp word is not empty and not a quotation mark
If TempWord > "" And Not (TempWord = "'" Or TempWord = Chr(145) Or TempWord = Chr(146)) Then
TmpArr = TrimSingQuotes(TempWord) 'send that word to be cleaned from single quotaion mark
If (Not TmpArr(0) = "") Then 'if the returned word has length, count it
WordCount = WordCount + 1
ArrUbound = WordCount - 1 'set the new upper dimension for the storing array
ReDim Preserve ArrayOfWords(ArrUbound) 'expand storing array when we have a cleaned word with length
'Save new word in the last place inside the array, along with the word start and end ranges
ArrayOfWords(ArrUbound) = Array(TmpArr(0), _
CharIndex - Len(TempWord) + TmpArr(1) - 1, _
CharIndex - Len(TempWord) + TmpArr(2) - 1)
End If
TempWord = "" 'reset the Temp Word
End If
End Select
Next CharIndex
'===================================================================================================================
ExtractWordsFromDoc_2 = ArrayOfWords 'Return the storing array through function name
'do some cleaning
Erase ArrayOfWords
Erase TmpArr
End Function
Sub testTrimSingQuotes()
TrimSingQuotes (Empty)
End Sub
Function TrimSingQuotes(ByVal TempWord$)
'SSQP =starting single quote position
'ESQP = ending single quote position
'==================================================================
If TempWord = "" Then
TrimSingQuotes = Array("", 0, 0)
Exit Function
End If
'==================================================================
Dim SSQP&: SSQP = 1
Dim ESQP&: ESQP = Len(TempWord)
'==================================================================
'trim starting single quotes
Do While (Mid(TempWord, SSQP, 1) = "'" Or Mid(TempWord, SSQP, 1) = Chr(145) Or Mid(TempWord, SSQP, 1) = Chr(146)) And SSQP < ESQP
SSQP = SSQP + 1
Loop
'==================================================================
'trim ending single quotes
Do While (Mid(TempWord, ESQP, 1) = "'" Or Mid(TempWord, ESQP, 1) = Chr(145) Or Mid(TempWord, ESQP, 1) = Chr(146)) And (ESQP > SSQP)
ESQP = ESQP - 1
Loop
'==================================================================
'get the trimmed word
TempWord = Mid(TempWord, SSQP, ESQP - SSQP + 1)
'==================================================================
'test the trimmed word for output
If TempWord > "" And Not (TempWord = "'" Or TempWord = Chr(145) Or TempWord = Chr(146)) Then
TrimSingQuotes = Array(TempWord, SSQP, ESQP)
Else
TrimSingQuotes = Array("", 0, 0)
End If
End Function
To be honest, I didn't spend a lot of time (i.e. none) figuring out why your code isn't working as intended. I suspect it has something to do with calculating your position in the input string.
It is much simpler to leverage the build in Split function to do your heavy lifting, and probably much more performant than relying on the string functions like Instr and Mid. Note that this relies on 2 quirks of the Split function:
First, if you call Split on an empty string, it returns an array with a UBound of -1.
Second, VBA's version of Split doesn't remove empty entries - so, Split("foo", "foo") results in the array { vbNullString, vbNullString }. This is good, because you can tell by the result how many delimiters were in the string based on the size of the resulting array (number of delimiters in the input will always equal the number of array elements minus one. In VBA terms, delimiterCount = UBound(Split(inputString, delimiter)).
Your requirement makes this easy in that your delimiters are all 1 character.
Try something like this:
Private Function MultiSplit(inValue As String, delimiters() As Variant) As Variant()
Dim output() As Variant
Dim bound As Long
ReDim Preserve output(bound)
Dim tokens() As String
Dim index As Long
tokens = Split(inValue, delimiters(0))
If UBound(tokens) = -1 Then
MultiSplit = Array(vbNullString, 0, 0)
Exit Function
End If
'Process each delimiter.
For index = 1 To UBound(delimiters)
tokens = SubSplit(tokens, CStr(delimiters(index)))
Next index
Dim position As Long
For index = LBound(tokens) To UBound(tokens)
If tokens(index) = vbNullString Then
'This means a delimiter was removed, so increment the position to account for it.
position = position + 1
Else
'Resize the output array and write the result for the remaining token.
ReDim Preserve output(bound)
output(bound) = Array(tokens(index), position, position + Len(tokens(index)) - 1)
position = position + Len(tokens(index)) + 1
bound = bound + 1
End If
Next index
MultiSplit = output
End Function
Private Function SubSplit(inValue() As String, delimiter As String) As String()
Dim tokens() As String
Dim substring As Variant
Dim token As Variant
Dim output() As String
output = Split(vbNullString)
For Each substring In inValue
tokens = Split(substring, delimiter)
'Test for an empty token - these need to be preserved in the output.
If UBound(tokens) = -1 Then
ReDim Preserve output(UBound(output) + 1)
Else
For Each token In tokens
ReDim Preserve output(UBound(output) + 1)
output(UBound(output)) = token
Next token
End If
Next substring
SubSplit = output
End Function
Test code:
Private Function TestCode()
Dim delims() As Variant
Dim results() As Variant
Dim test As String
delims = Array(Chr(34), Chr(147), Chr(148), Chr(32), ",", ".", vbCr, vbTab, "/", "!", "|", ";", ")", "(", "?")
test = "foo|||bar,,baz?crux"
results = MultiSplit(test, delims)
Dim result As Variant
For Each result In results
Debug.Print result(0) & vbTab & result(1) & vbTab & result(2)
Next result
End Function
Note that I didn't examine the existing code even far enough to determine if your output positions were 1 or 0 based. The example above is 0 based. If you need 1 based, insert position = 1 after Dim position As Long.
Quote removal is left as an exercise for the reader.
I have problem with my matching function actually I have to count number of lines with specific string and return line number ,so I have one dimensional array of string that contain the unique strings of text file {33,7,77,3 23,6} and text file with the same strings in array I have read lines of text file to array , but with duplicate of these strings ,when I use regex.match it works not bad expect when I check if line 2 contain 3 the function return True it's consider 3 in 23 as 3 , and the above explanation is just example of what I need any help please
Module Module1
Sub Main()
Dim txt As String = File.ReadAllText("e:\ii.txt")
' Use regular expressions to replace characters
' that are not letters or numbers with spaces.
Dim reg_exp As New Regex("[^a-zA-Z0-9]")
txt = reg_exp.Replace(txt, " ")
' Split the text into words.
'Dim words() As String = txt.Split( _
' New Char() {" "c}, _
' StringSplitOptions.RemoveEmptyEntries)
Dim words = txt.Split(New String() {" ", Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
' Use LINQ to get the unique words.
Dim word_query = _
(From word As String In words _
Order By word _
Select word).Distinct()
Dim stra() As String = word_query.ToArray()
For i = 0 To stra.Length - 1
Console.WriteLine(" " & stra(i))
Next
Dim lines() As String = IO.File.ReadAllLines("e:\ii.txt")
For i = 0 To lines.Length - 1
Console.WriteLine(lines(1))
Dim linecount = 0
Dim regex As New Regex(stra(i), RegexOptions.ExplicitCapture)
Dim match As Match = regex.Match(lines(1))
If match.Success Then
linecount += 1
Console.WriteLine("linecount= " & linecount)
Else
Console.WriteLine("false")
End If
Next
End Sub
End Module
You many not have to split the text into words. Is your word list very long? From what I understand you want the following:
1.Read a text file and return the line number for a given word or phrase.
Is the word or phrase complex? If not, why not use a the Contains extension method?
For example:
Dim myString = "Hello World"
If myString.Contains("World") Then
'Add line number to line count.
End if
If you are using this as an opportunity to learn regular expressions, I highly recommend "Mastering Regular Expressions" by Jeffrey Friedl. When I first begun I invested in a program RegexBuddy, which is worth the money. But now there are so many online regex testers now, that could be an alternative for something free.
Enhance your regex with anchors. These will ascertain that the whole test string matches instead of a substring. The following code also assembles all match patterns of interest into a single regex pattern which will be used against each line of the target file:
Dim strall As String
strall = ""
For i = 0 To stra.Length - 1
If i > o Then
strall = strall & "|"
End If
strall = strall & stra(i)
Console.WriteLine(" " & stra(i))
Next
strall = "^(" & strall & ")$"
Dim regexall As New Regex(strall, RegexOptions.ExplicitCapture)
'...
Dim linecount = 0
Dim match As Match = regexall.Match(lines(i)) '... was 'lines(1)', probably a typo
If match.Success Then
'...
this code is working with me thanks for all
Public Function countlines(ByVal st As String) As Integer
Dim count As Integer
Dim linecount As Integer = 0
Dim substrings() As String = Regex.Split(st, " ")
Dim stt() As String = {23, 7, 3}
For i = 0 To stt.Length - 1
'For j = 0 To substrings.Length - 1
'Console.WriteLine(substrings(0))
'For i = 0 To substrings.Length - 1
'Console.Write(substrings(i))
Dim matchQuery = From word In substrings Where word.ToLowerInvariant() = stt(i).ToLowerInvariant() Select word
' ' Count the matches.
count = matchQuery.Count()
Console.WriteLine("count=" & count)
If count > 0 Then
linecount += 1
Else
Console.WriteLine(" linecount=" & linecount)
End If
Next
Console.WriteLine("linecount= " & linecount)
Return linecount
End Function
Sub Main()
Dim lines() As String = IO.File.ReadAllLines("e:\ii.txt")
For Each line As String In lines
countlines(line)
Next
End Sub
Through much research I have figured out a code to truncate sentances stored in cells to 100 characters or less, and add the excess to a second string. I have been really struggling trying to turn this into a function.
I would like to have the function accept a range of (1 column by various rows) OR, if that isn't possible, an Array of the same range values. Also there should be a way to set the number of characters that each output string can hold, output as an array of strings.
i.e. wordWrap(Input 'range or array', maxLength as integer) output of wordWrap will be an array of the results
Here is my current code:
Sub wordWrap()
'This procedure is intended to check the character length of a string and truncate all the words over 100 characters
'To a second string. (basically a word wrap)
Dim sumCount As Integer, newCount As Integer, i As Integer
Dim newString As String, newString2 As String
Dim words As Variant
Dim lenwords(0 To 1000) As Variant
Dim myRange As Range
sumCount = 0
newCount = 0
newString = ""
newString2 = ""
With Range("Q:Q")
.NumberFormat = "#"
End With
Set myRange = Range("B3")
words = Split(myRange.Value, " ")
For i = 0 To UBound(words)
lenwords(i) = Len(words(i))
Range("Q3").Offset(i, 0) = CStr(words(i)) 'DEBUG
Range("R3").Offset(i, 0) = lenwords(i) 'DEBUG
If sumCount + (lenwords(i) + 1) < 100 Then
sumCount = sumCount + (lenwords(i) + 1)
newString = newString & " " & words(i)
Else
newCount = newCount + (lenwords(i) + 1)
newString2 = newString2 & " " & words(i)
End If
Next
'DEBUG
Range("S3") = CStr(newString)
Range("T3") = Trim(CStr(newString2))
Range("S4") = Len(newString)
Range("T4") = Len(newString2)
ActiveSheet.UsedRange.Columns.AutoFit
End Sub
So if a range of ("B2:B6")or equivalent array are entered at max 100 characters:
c = wordWrap(Range("B2:B6"),100)
Basically what this should do is count the length of each cell(or element) and truncate any extra words that make the string over 100 characters and concatenate them to the front of the next element in the output array to the next element of the output array. If that would put that element over 100 characters, then do the same process again until all of the elements contain sentence strings less then 100 characters long. It should add an extra element at the end to fit any leftover words.
I have been tearing out my hair trying to get this to work. I could use the advice of the experts.
Any help appreciated.
Example asked for:
http://s21.postimg.org/iywbgy307/trunc_ex.jpg
The ouput should be into an array, though, and not directly back to the worksheet.
The function:
Function WordWrap(ByVal Rng As Range, Optional ByVal MaxLength As Long = 100) As String()
Dim rCell As Range
Dim arrOutput() As String
Dim sTemp As String
Dim OutputIndex As Long
Dim i As Long
ReDim arrOutput(1 To Evaluate("CEILING(SUM(LEN(" & Rng.Address(External:=True) & "))," & MaxLength & ")/" & MaxLength) * 2)
For Each rCell In Rng.Cells
If Len(Trim(sTemp & " " & rCell.Text)) > MaxLength Then
OutputIndex = OutputIndex + 1
arrOutput(OutputIndex) = Trim(Left(sTemp & " " & rCell.Text, InStrRev(Left(sTemp & " " & rCell.Text, MaxLength), " ")))
sTemp = Trim(Mid(sTemp & " " & rCell.Text, Len(arrOutput(OutputIndex)) + 2))
For i = 1 To Len(sTemp) Step MaxLength
If Len(sTemp) < MaxLength Then Exit For
OutputIndex = OutputIndex + 1
arrOutput(OutputIndex) = Trim(Left(sTemp, InStrRev(Left(sTemp, MaxLength), " ")))
sTemp = Trim(Mid(sTemp, Len(arrOutput(OutputIndex)) + 2))
Next i
Else
OutputIndex = OutputIndex + 1
arrOutput(OutputIndex) = Trim(sTemp & " " & rCell.Text)
sTemp = ""
End If
Next rCell
OutputIndex = OutputIndex + 1
arrOutput(OutputIndex) = sTemp
ReDim Preserve arrOutput(1 To OutputIndex)
WordWrap = arrOutput
Erase arrOutput
End Function
The call:
Sub tgr()
Dim arrWrapped() As String
arrWrapped = WordWrap(Range("B2:B6"), 100)
MsgBox Join(arrWrapped, Chr(10) & Chr(10))
End Sub
Instead of a msgbox, you could output it to a sheet, or do whatever else you wanted.
going to say you get passed a string, and want to return an array
performance might be slow with this approach
dim words(1) as variant
dim lastSpace as Integer
dim i as Integer
words(1) = Cells(1, 1)
while(Len(words(UBound(words) - 1)) > 100) 'check if the newest array is > 100 characters
Redim words(UBound(words) + 1)
'find the last space
for i = 0 to 100
if(words(i) = " ") Then
lastSpace = i
EndIF
Next
words(UBound(words) - 1) = Mid(words(UBound(words) - 2), lastSpace) 'copy words after the last space before the 100th character
words(UBound(words) - 2) = Left(words(UBound(words) - 2), 100 - lastSpace) 'copy the words from the beginning to the last space
Wend
Not sure if this will compile/run but it should give you the general idea