Improve performance finding matches in a big list/array - arrays

I am building a tool to find images in a big folder of images (400k images). On that folder I have images like this:
c:\images\100001_01.jpg
c:\images\100001_05.jpg
c:\images\100001_07.jpg
c:\images\100005_05.jpg
c:\images\100010_00.jpg
Then I have my references in a text box, but only the 6 digit number:
100001
100005
100006
Etc
So I have let's say 1000 references I need to have the images for, I want to loop through all the image folder and take the file if exists. I have built this using both an array with loops and a list and getting the index. I thought getting the index of the list would be much faster, but they are actually the same.
Here are the two routines I have developed, one is using a list and then getting FindIndex to get the index. The second option is looping through all the references and at the same time looping through all images to check if any contains that reference - that is 400 million loops if I use a set of 1000 references!
Using a list takes 69 seconds, however looping through the arrays takes 64 seconds. Nevertheless taking all images on the directory using GetFile takes 120 seconds already.
Can you think of any way to make this faster?
Private Sub ExtractImagesUsingList()
Dim ListOfReferences As New List(Of String) 'the actual list of references is in a textbox, ie.: 100001, 100002, etc
For Each line In txtBox.Lines
ListOfReferences.Add(line.ToString)
Next
Dim ListOfimages As New List(Of String)
For Each file In IO.Directory.GetFiles("c:\images\")
ListOfimages.Add(file)
Next
For Each ref In ListOfReferences
Dim index As Integer = ListOfimages.FindIndex(Function(x As String) x.Contains(ref))
Next
End Sub
Private Sub ExtractImagesUsingArrayLoop()
Dim ListOfreferences As New List(Of String)'the actual list of references is in a textbox
For Each line In txtBox.Lines
If line.Length > 1 Then
ListOfReferences.Add(line.ToString)
End If
Next
Dim ArrayImages() As String = IO.Directory.GetFiles("c:\images\")
For Each reference In ListOfReferences
For Each image In ArrayImages
If image.Contains(reference) Then
Exit For ' I exist the FOR here because I am only interested in one image per reference
End If
Next
Next
End Sub

It's not 100% clear what you're trying to accomplish, but you could improve performance by implementing Directory.EnumerateFiles.
Additionally, you could pair this with the built-in search function using a wildcard match against the known reference key from the reference list.
Finally, if the image directory is not subject to frequent changes, you could cache the images in a dictionary to speed up future searches.
Here is a rough example of those ideas. Note I've removed any references to form controls, and assume parameters are instead being passed in.
Private _imageMap As Dictionary(Of String, ICollection(Of String))
Public ReadOnly Property ImageMap As Dictionary(Of String, ICollection(Of String))
Get
If _imageMap Is Nothing Then
_imageMap = New Dictionary(Of String, ICollection(Of String))()
End If
Return _imageMap
End Get
End Property
Public Sub RefreshImageMap()
_imageMap = Nothing
End Sub
Public Function GetImagePaths(imageFolder As String, referenceKey As String) As ICollection(Of String)
Dim imagePaths As ICollection(Of String) = Nothing
If Not ImageMap.TryGetValue(referenceKey, imagePaths) Then
imagePaths = Directory.EnumerateFiles(imageFolder, $"{referenceKey}_*.jpg").ToList()
ImageMap.Add(referenceKey, imagePaths)
End If
Return imagePaths
End Function
Also if you wanted to run multiple reference keys through a function, but also needed to keep the original reference key passed in, you could add something like this:
Public Iterator Function GetImagePaths(imageFolder As String, referenceKeys As IEnumerable(Of String)) As IEnumerable(Of KeyValuePair(Of String, ICollection(Of String)))
For Each referenceKey As String In referenceKeys
Dim imagePaths = GetImagePaths(imageFolder, referenceKey)
Yield New KeyValuePair(Of String, ICollection(Of String))(referenceKey, imagePaths)
Next
End Function
None of this is tested, and this lacks proper parameter checks on functions, but it should give you a direction to try.

Related

Error reading JArray from JsonReader VB.net

where does it go wrong?
my coding
Imports Newtonsoft.Json
Imports Newtonsoft.Json.Linq
Imports System.Net
Public Class DigiposAJA
Private Sub CekPaket()
Dim json As String = (New WebClient).DownloadString("http://192.168.101.1:100/list_product?username=SIP12&category=ROAMING&to=0811&payment_method=LINKAJA&json=1")
Dim jarr As JArray = Linq.JArray.Parse(json)
Dim sKatagori As String
For Each jtk As JToken In jarr
sKatagori = jtk.SelectToken("kategori")
DgvDigipos.Rows.Add()
DgvDigipos.Rows(DgvDigipos.Rows.Count - 1).Cells("DgvKategori").Value = sKatagori
Next
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
CekPaket()
End Sub
End Class
after I debug the result is an error like this.
Newtonsoft.Json.JsonReaderException: 'Error reading JArray from JsonReader. Current JsonReader item is not an array: StartObject. Path '', line 1, position 1.'
Can you help me to get a great result
Most likely this is a result of your call to the web service not returning the result you expect.
This is actually a good example of the benefits of separation of concerns and strongly typed objects. Your sub CekPaket should be broken down into 3 parts. 1) get the string. This should be a function and should use some sort of configuration to get the end point and have appropriate guards for failure, 2) parse the string into a strongly typed object (IEnumerable of whatever), this should validation to make sure that the input is good. You might want to make this function public for easy testing. And finally 3) bind your results to your UI. It looks like you are doing this part by hand, whenever possible you should allow the frame work to do this for you by providing a data source and a template for the display.

Manipulating Array After Pulling From Text File

I've been thoroughly combing StackOverflow and other sources for the answers to these problems, and have not been able to find a solution that would work cohesively with the steps I need to accomplish.
Things I need to do:
Create an array from a text file and display in a listbox (this is done and works)
Have user fill in a text box, click a button, and the array is searched for anything matching the text box's value
Have the results of the search displayed in a separate listbox
Here's what I've got so far, and it's fairly hacked together, so if there's anything that can be improved, naturally, I'd be all for that.
`
Public Class Form1
Dim lblName As Object
Public colleges As String
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim colleges() As String = IO.File.ReadAllLines("Colleges.txt")
ListBoxCollege.Items.AddRange(colleges)
End Sub
Private Sub btnSearchGo_Click(sender As Object, e As EventArgs) Handles btnSearchGo.Click
Dim n As Integer, college As String
college = txtCollegeSearchUserInput.Text
n = Array.IndexOf(colleges(), college)
If n <> 1 Then
[[Needs to output into new listbox, preferably here]]
End If
End Sub
If there's anything else needed from VB, I can provide if necessary!
In your case you can do something like this
For i As Integer = 0 To ListBoxCollege.Items.Count -1
If ListBoxCollege.Items(i).ToString().IndexOf(college, StringComparison.OrdinalIgnoreCase) > -1 Then
findList.Items.Add(ListBoxCollege.Items(i))
End If
Next
The difference here - you calling IndexOf on array and I call it for each item in list. Therefore I return all matches, while you only the first one
This is little bit limited in search criteria. You could use regex as well for wild cards etc. Or you store your data (colleges) in System.Data.DataTable, and you would be able to run Sql Select queries on it almost like in database.

Creating array of cells in itextsharp for VB.NET

I am trying to create a pdf with a table with each cell having lots of different properties (i.e. border widths and text font, etc.) so instead of having to write out the code 500 times fore each individual cell I want to have an array of cells. I have the following code:
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports System.IO
Public Class pdfQuote
Dim cell() As PdfPCell
Dim table As New PdfPTable(7)
Dim n As Integer = 0
Public Sub createNewQuote()
newCell("test")
End Sub
Public Sub newCell(text As String)
n += 1
cell(n) = New PdfPCell(New Phrase(text))
table.AddCell(cell(n))
End Sub
End Class
I get the error: "Object reference not set to an instance of an object"
Any help would be much appreciated.
Thanks in advance
If you really want to make your life easier switch from an array to List. By doing that you don't need to keep track of the current index.
Public Class pdfQuote
Dim cells As New List(Of PdfPCell)
Dim table As New PdfPTable(7)
Public Sub createNewQuote()
newCell("test")
End Sub
Public Sub newCell(text As String)
cells.Add(New PdfPCell(New Phrase(text)))
table.AddCell(cells.Last)
End Sub
End Class
EDIT
Also I'd recommending creating some helper methods so that you can share as much code as possible. This post has a simple example of that.

How to have a global Dictionary in VB.NET/WPF application to save data from different windows?

I am new to VB.NET and WPF.
I am building a "Questionnaire" app. Users will be presented sequentially with different questions/tasks (windows). After they respond on each question/task and press a "submit" button a new window will open with a new question/task, and previous window will close. After each question, when the button is pressed, I need to store data to some global object. After all questions are answered the data of this object should be written out to the output file.
I figured out that Dictionary will be the best to store the results after each window.
I am not sure how, where to create this global Dictionary and how to access it. Should I use View Model? If yes, can you give an example? Or, should it be just a simple class with shared property? (something like this)
EDIT 2: I tried many different ways recommended online
GlobalModule:
Module GlobalModule
Public Foo As String
End Module
GlobalVariables:
Public Class GlobalVariables
Public Shared UserName As String = "Tim Johnson"
Public Shared UserAge As Integer = 39
End Class
Global properties:
Public Class Globals
Public Shared Property One As String
Get
Return TryCast(Application.Current.Properties("One"), String)
End Get
Set(ByVal value As String)
Application.Current.Properties("One") = value
End Set
End Property
Public Shared Property Two As Integer
Get
Return Convert.ToInt32(Application.Current.Properties("Two"))
End Get
Set(ByVal value As Integer)
Application.Current.Properties("Two") = value
End Set
End Property
End Class
Here is where I save the data to global variables/properties in the first window. I need to store data in this subroutine before closing an old window and opening a new window. I use MessageBox just for testing.
Private Sub btnEnter_Click(ByVal sender As Object, ByVal e As System.Windows.RoutedEventArgs) Handles btnEnter.Click
Dim instructionWindow As InstructionsWindow
instructionWindow = New InstructionsWindow()
Application.Current.Properties("number") = textBoxValue.Text
Globals.One = "2"
Globals.Two = 3
MessageBox.Show("GlobalVariables: UserName=" & GlobalVariables.UserName & " UserAge=" & GlobalVariables.UserAge)
GlobalVariables.UserName = "Viktor"
GlobalVariables.UserAge = 34
GlobalModule.Foo = "Test Foo"
'testing if it saved tha value
'MessageBox.Show(Application.Current.Properties("number"))
Application.Current.MainWindow.Close()
instructionWindow.ShowDialog()
End Sub
Next subroutine is where I am trying to retrieve the value from global Properties/variables in the second window, but message boxes come out empty. There might also the case that I am assigning values in a wrong way, or not reading them in a right way (casting?) :
Private Sub FlowDocReader_Initialized(ByVal sender As Object, ByVal e As System.EventArgs) Handles FlowDocReader.Initialized
' Get a reference to the Application base class instance.
Dim currentApplication As Application = Application.Current
MessageBox.Show(currentApplication.Properties("number"))
MessageBox.Show("One = " & Globals.One & " Two = " & Globals.Two)
MessageBox.Show("GlobalVariables: UserName=" & GlobalVariables.UserName & " UserAge=" & GlobalVariables.UserAge)
MessageBox.Show("GlobalModule.Foo = " & GlobalModule.Foo)
Dim filename As String = My.Computer.FileSystem.CurrentDirectory & "\instructions.txt"
Dim paragraph As Paragraph = New Paragraph()
paragraph.Inlines.Add(System.IO.File.ReadAllText(filename))
Dim document As FlowDocument = New FlowDocument(paragraph)
FlowDocReader.Document = document
End Sub
Thanks.
You can make public Dictionary property for form and put your dictionry to this property or make constructor with Dictionary argument.
You already have this dictionary Application.Properties
Look here, please.
First, you can define a dictionary (list of lists) as follows at the beginning of a form or in a module
Dim dic As New Dictionary(Of String, List(Of String))
As the user completes questions on a form, write the partucular form number and query results to a single record in the dic before going to the next form (place this code into the "Next" button):
'Assume q1response=3, q2response=4,..., qpresponse="text", etc.
Dim myValues As New List(Of String)
myValues.Add(formname)
myValues.Add(q1response)
myValues.Add(q2response)
.
.
myValues.Add(qpresponse)
dic.Add(username, myValues)
When a user is done, there will be multiple records in the dictionary, each of which starts with their name and is followed by question responses. You can loop through multiple dictionary records, where each record is for a user using the following:
For Each DictionaryEntry In dic 'this loops through dic entries
Dim str As List(Of String) = DictionaryEntry.Value
'here you can do whatever you want with results while you read through dic records
'username will be = str(0)
'formname will be str(1)
'q1 response on "formname" will be str(2)
'q2 response on "formname" will be str(3)
'q3 response on "formname" will be str(4)
...
Next
The trick is that there will be multiple dictionary records with results for one user, where record one can have results like "John Doe,page1,q1,q2,q3" and record 2 will be "John Doe,page2,q4,q5,q6." Specifically, the "str" in the above loop will be an array of string data containing all the items within each dictionary record, that is, in str(0), str(1), str(2),... This is the information you need to work with or move, save, analyze, etc.
You can always put all the code I provided in a class (which will be independent of any form) and dimension the sic is a Sub New in this class, with the updating .Add values lines in their own sub in this same class). Then just Dim Updater As New MyNewClassName. Call the Updater in each continue button using Call Updater.SubNameWithAddValues(q1,q2,...qp). It won't matter where you are in your program since you using a specific class. The one thing I noticed with my code is that you can only use the line that adds the "key" or the username once, so use it after the last query -so put it in a Sub Finished in your new class and call as Call Updater.Finished(username,q30,q31,last)

Reading from a list in a text file, ripping certain a line and showing in listbox, Visual studio 2010

I'm creating a barcode scanning program in visual studio 2010 using vb.
I have come on pretty far, but have seemed to get stuck at this little problem.
I have a text file saved and a data in it displayed like this:
0001#Unsmoked Middle Bacon
0002#Smoked Middle bacon
0003#Unsmoked Bits
0004#Smoked Bits
0005#Unsmoked Back
0006#Smoked Back
0007#Unsmoked Streaky
0008#Smoked Streaky
I have no problem reading and splitting the strings with #, and I can populate 2 listboxes, 1 displaying the 4 digit code, and the other the product name. (this was just a test scenario)
What i really want to do, is search the file for a variable that is a user inputed number such as "0004" and this would display back to me "smoked bits".
I think i am wanting to read down line by line, until it hits the right number, then read across maybe using a substr? You guys could probably help me alot here.
While Not sreader.EndOfStream
lineIn = sreader.ReadLine()
Dim elements() As String = Nothing
elements = lineIn.Split("#")
lstProdTest.Items.Add(elements(0))
lstProdName.Items.Add(elements(1))
PLUnumber(index) = elements(0)
itemName(index) = elements(1)
numProds = numProds + 1
index = index + 1
End While
As Origin says, providing this file isnt so large as to consume too much memory, reading the data once is the way to go:
Private _barcodes As Dictionary(Of Integer, String)
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
'EDIT forgot to initialize _barcodes:
_barcodes = New Dictionary(Of Integer, String)
For Each line In IO.File.ReadAllLines("c:\path\to\file.txt")
Dim data = line.Split("#"c)
_barcodes.Add(CInt(data(0)), data(1))
Next
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim input As String = InputBox("type the barcode to test, eg 0004:")
Dim key As Integer = CInt(input)
'if you entered 0004 then this will display Smoked Bits
If _barcodes.ContainsKey(key) Then
MessageBox.Show(_barcodes(key))
Else
MessageBox.Show("Key not found")
End If
End Sub
Note this is just a quick example and would require error handling to be added (for missing file, incorrect format of data etc)
If the amount of data is huge then consider a database instead, sqlite would be a simple option
As they say, premature optimization is the root of all evils. Instead of reading your file each time you need an item description, you should read the file in once (at the start of the application), store it in memory (perhaps as a Dictionary(of Integer, String)) and then reference this when trying to get the description for an item.
You could of course go further and create a custom class to store additional information about each entry.

Resources