I'm new to Go so please excuse my ignorance. I'm attempting to iterate through a bunch of wordlists line by line indefinitely with goroutines. But when trying to do so, it does not iterate or stops half way through. How would I go about this in the proper manner without breaking the flow?
package main
import (
"bufio"
"fmt"
"os"
)
var file, _ = os.Open("wordlist.txt")
func start() {
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
}
func main(){
for t := 0; t < 150; t++ {
go start()
fmt.Scanln()
}
}
Thank you!
You declare file as a global variable. Sharing read/write file state amongst multiple goroutines is a data race and will give you undefined results.
Most likely, reads start where the last read from any of the goroutines left off. If that's end-of-file, it likely continues to be end-of-file. But, since the results are undefined, that's not guaranteed. Your erratic results are due to undefined behavior.
Here's a revised version of your program that declares a local file variable and uses a sync.Waitgroup to synchronize the completion of all the go start() goroutines and the main goroutine. The program checks for errors.
package main
import (
"bufio"
"fmt"
"os"
"sync"
)
func start(filename string, wg *sync.WaitGroup, t int) {
defer wg.Done()
file, err := os.Open(filename)
if err != nil {
fmt.Println(err)
return
}
defer file.Close()
lines := 0
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines++
}
if err := scanner.Err(); err != nil {
fmt.Println(err)
return
}
fmt.Println(t, lines)
}
func main() {
wg := &sync.WaitGroup{}
filename := "wordlist.txt"
for t := 0; t < 150; t++ {
wg.Add(1)
go start(filename, wg, t)
}
wg.Wait()
}
I have an array of strings, and I'd like to exclude values that start in foo_ OR are longer than 7 characters.
I can loop through each element, run the if statement, and add it to a slice along the way. But I was curious if there was an idiomatic or more golang-like way of accomplishing that.
Just for example, the same thing might be done in Ruby as
my_array.select! { |val| val !~ /^foo_/ && val.length <= 7 }
There is no one-liner as you have it in Ruby, but with a helper function you can make it almost as short.
Here's our helper function that loops over a slice, and selects and returns only the elements that meet a criteria captured by a function value:
func filter(ss []string, test func(string) bool) (ret []string) {
for _, s := range ss {
if test(s) {
ret = append(ret, s)
}
}
return
}
Starting with Go 1.18, we can write it generic so it will work with all types, not just string:
func filter[T any](ss []T, test func(T) bool) (ret []T) {
for _, s := range ss {
if test(s) {
ret = append(ret, s)
}
}
return
}
Using this helper function your task:
ss := []string{"foo_1", "asdf", "loooooooong", "nfoo_1", "foo_2"}
mytest := func(s string) bool { return !strings.HasPrefix(s, "foo_") && len(s) <= 7 }
s2 := filter(ss, mytest)
fmt.Println(s2)
Output (try it on the Go Playground, or the generic version: Go Playground):
[asdf nfoo_1]
Note:
If it is expected that many elements will be selected, it might be profitable to allocate a "big" ret slice beforehand, and use simple assignment instead of the append(). And before returning, slice the ret to have a length equal to the number of selected elements.
Note #2:
In my example I chose a test() function which tells if an element is to be returned. So I had to invert your "exclusion" condition. Obviously you may write the helper function to expect a tester function which tells what to exclude (and not what to include).
Have a look at robpike's filter library. This would allow you to do:
package main
import (
"fmt"
"strings"
"filter"
)
func isNoFoo7(a string) bool {
return ! strings.HasPrefix(a, "foo_") && len(a) <= 7
}
func main() {
a := []string{"test", "some_other_test", "foo_etc"}
result := Choose(a, isNoFoo7)
fmt.Println(result) // [test]
}
Interestingly enough the README.md by Rob:
I wanted to see how hard it was to implement this sort of thing in Go, with as nice an API as I could manage. It wasn't hard.
Having written it a couple of years ago, I haven't had occasion to use it once. Instead, I just use "for" loops.
You shouldn't use it either.
So the most idiomatic way according to Rob would be something like:
func main() {
a := []string{"test", "some_other_test", "foo_etc"}
nofoos := []string{}
for i := range a {
if(!strings.HasPrefix(a[i], "foo_") && len(a[i]) <= 7) {
nofoos = append(nofoos, a[i])
}
}
fmt.Println(nofoos) // [test]
}
This style is very similar, if not identical, to the approach any C-family language takes.
Today, I stumbled on a pretty idiom that surprised me. If you want to filter a slice in place without allocating, use two slices with the same backing array:
s := []T{
// the input
}
s2 := s
s = s[:0]
for _, v := range s2 {
if shouldKeep(v) {
s = append(s, v)
}
}
Here's a specific example of removing duplicate strings:
s := []string{"a", "a", "b", "c", "c"}
s2 := s
s = s[:0]
var last string
for _, v := range s2 {
if len(s) == 0 || v != last {
last = v
s = append(s, v)
}
}
If you need to keep both slices, simply replace s = s[:0] with s = nil or s = make([]T, 0, len(s)), depending on whether you want append() to allocate for you.
There are a couple of nice ways to filter a slice without allocations or new dependencies. Found in the Go wiki on Github:
Filter (in place)
n := 0
for _, x := range a {
if keep(x) {
a[n] = x
n++
}
}
a = a[:n]
And another, more readable, way:
Filtering without allocating
This trick uses the fact that a slice shares the same backing array
and capacity as the original, so the storage is reused for the
filtered slice. Of course, the original contents are modified.
b := a[:0]
for _, x := range a {
if f(x) {
b = append(b, x)
}
}
For elements which must be garbage collected, the following code can
be included afterwards:
for i := len(b); i < len(a); i++ {
a[i] = nil // or the zero value of T
}
One thing I'm not sure about is whether the first method needs clearing (setting to nil) the items in slice a after index n, like they do in the second method.
EDIT: the second way is basically what MicahStetson described in his answer. In my code I use a function similar to the following, which is probably as good as it gets in terms on performance and readability:
func filterSlice(slice []*T, keep func(*T) bool) []*T {
newSlice := slice[:0]
for _, item := range slice {
if keep(item) {
newSlice = append(newSlice, item)
}
}
// make sure discarded items can be garbage collected
for i := len(newSlice); i < len(slice); i++ {
slice[i] = nil
}
return newSlice
}
Note that if items in your slice are not pointers and don't contain pointers you can skip the second for loop.
There isn't an idiomatic way you can achieve the same expected result in Go in one single line as in Ruby, but with a helper function you can obtain the same expressiveness as in Ruby.
You can call this helper function as:
Filter(strs, func(v string) bool {
return strings.HasPrefix(v, "foo_") // return foo_testfor
}))
Here is the whole code:
package main
import "strings"
import "fmt"
// Returns a new slice containing all strings in the
// slice that satisfy the predicate `f`.
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) && len(v) > 7 {
vsf = append(vsf, v)
}
}
return vsf
}
func main() {
var strs = []string{"foo1", "foo2", "foo3", "foo3", "foo_testfor", "_foo"}
fmt.Println(Filter(strs, func(v string) bool {
return strings.HasPrefix(v, "foo_") // return foo_testfor
}))
}
And the running example: Playground
you can use the loop as you did and wrap it to a utils function for reuse.
For multi-datatype support, copy-paste will be a choice. Another choice is writing a generating tool.
And final option if you want to use lib, you can take a look on https://github.com/ledongthuc/goterators#filter that I created to reuse aggregate & transform functions.
It requires the Go 1.18 to use that support generic + dynamic type you want to use with.
filteredItems, err := Filter(list, func(item int) bool {
return item % 2 == 0
})
filteredItems, err := Filter(list, func(item string) bool {
return item.Contains("ValidWord")
})
filteredItems, err := Filter(list, func(item MyStruct) bool {
return item.Valid()
})
It also supports Reduce in case you want to optimize the way you select.
Hope it's useful with you!
"Select Elements from Array" is also commonly called a filter function. There's no such thing in go. There are also no other "Collection Functions" such as map or reduce. For the most idiomatic way to get the desired result, I find https://gobyexample.com/collection-functions a good reference:
[...] in Go it’s common to provide collection functions if and when they are specifically needed for your program and data types.
They provide an implementation example of the filter function for strings:
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) {
vsf = append(vsf, v)
}
}
return vsf
}
However, they also say, that it's often ok to just inline the function:
Note that in some cases it may be clearest to just inline the
collection-manipulating code directly, instead of creating and calling
a helper function.
In general, golang tries to only introduce orthogonal concepts, meaning that when you can solve a problem one way, there shouldn't be too many more ways to solve it. This adds simplicity to the language by only having a few core concepts, such that not every developer uses a different subset of the language.
Take a look at this library: github.com/thoas/go-funk
It provides an implementation of a lot of life-saving idioms in Go (including filtering of elements in array for instance).
r := funk.Filter([]int{1, 2, 3, 4}, func(x int) bool {
return x%2 == 0
}
Here is an elegant example of both Fold and Filter that uses recursion to accomplish filtering. FoldRight is also generally useful. It is not stack safe but could be made so with trampolining. Once Golang has generics it can be entirely generalized for any 2 types:
func FoldRightStrings(as, z []string, f func(string, []string) []string) []string {
if len(as) > 1 {//Slice has a head and a tail.
h, t := as[0], as[1:len(as)]
return f(h, FoldRightStrings(t, z, f))
} else if len(as) == 1 {//Slice has a head and an empty tail.
h := as[0]
return f(h, FoldRightStrings([]string{}, z, f))
}
return z
}
func FilterStrings(as []string, p func(string) bool) []string {
var g = func(h string, accum []string) []string {
if p(h) {
return append(accum, h)
} else {
return accum
}
}
return FoldRightStrings(as, []string{}, g)
}
Here is an example of its usage to filter out all the strings with length < 8
var p = func(s string) bool {
if len(s) < 8 {
return true
} else {
return false
}
}
FilterStrings([]string{"asd","asdfas","asdfasfsa","asdfasdfsadfsadfad"}, p)
I`m developing this library: https://github.com/jose78/go-collection. PLease try this example to filter elements:
package main
import (
"fmt"
col "github.com/jose78/go-collection/collections"
)
type user struct {
name string
age int
id int
}
func main() {
newMap := generateMapTest()
if resultMap, err := newMap.FilterAll(filterEmptyName); err != nil {
fmt.Printf("error")
} else {
fmt.Printf("Result: %v\n", resultMap)
result := resultMap.ListValues()
fmt.Printf("Result: %v\n", result)
fmt.Printf("Result: %v\n", result.Reverse())
fmt.Printf("Result: %v\n", result.JoinAsString(" <---> "))
fmt.Printf("Result: %v\n", result.Reverse().JoinAsString(" <---> "))
result.Foreach(simpleLoop)
err := result.Foreach(simpleLoopWithError)
if err != nil {
fmt.Println(err)
}
}
}
func filterEmptyName(key interface{}, value interface{}) bool {
user := value.(user)
return user.name != "empty"
}
func generateMapTest() (container col.MapType) {
container = col.MapType{}
container[1] = user{"Alvaro", 6, 1}
container[2] = user{"Sofia", 3, 2}
container[3] = user{"empty", 0, -1}
return container
}
var simpleLoop col.FnForeachList = func(mapper interface{}, index int) {
fmt.Printf("%d.- item:%v\n", index, mapper)
}
var simpleLoopWithError col.FnForeachList = func(mapper interface{}, index int) {
if index > 0 {
panic(fmt.Sprintf("Error produced with index == %d\n", index))
}
fmt.Printf("%d.- item:%v\n", index, mapper)
}
Result of execution:
Result: map[1:{Alvaro 6 1} 2:{Sofia 3 2}]
Result: [{Sofia 3 2} {Alvaro 6 1}]
Result: [{Alvaro 6 1} {Sofia 3 2}]
Result: {Sofia 3 2} <---> {Alvaro 6 1}
Result: {Alvaro 6 1} <---> {Sofia 3 2}
0.- item:{Sofia 3 2}
1.- item:{Alvaro 6 1}
0.- item:{Sofia 3 2}
Recovered in f Error produced with index == 1
ERROR: Error produced with index == 1
Error produced with index == 1
The DOC currently are located in wiki section of the project. You can try it in this link. I hope you like it...
REgaRDS...
I'm trying to learn Go (or Golang) and can't seem to get it right. I have 2 texts files, each containing a list of words. I'm trying to count the amount of words that are present in both files.
Here is my code so far :
package main
import (
"fmt"
"log"
"net/http"
"bufio"
)
func stringInSlice(str string, list []string) bool {
for _, v := range list {
if v == str {
return true
}
}
return false
}
func main() {
// Texts URL
var list = "https://gist.githubusercontent.com/alexcesaro/c9c47c638252e21bd82c/raw/bd031237a56ae6691145b4df5617c385dffe930d/list.txt"
var url1 = "https://gist.githubusercontent.com/alexcesaro/4ebfa5a9548d053dddb2/raw/abb8525774b63f342e5173d1af89e47a7a39cd2d/file1.txt"
//Create storing arrays
var buffer [2000]string
var bufferUrl1 [40000]string
// Set a sibling counter
var sibling = 0
// Read and store text files
wordList, err := http.Get(list)
if err != nil {
log.Fatalf("Error while getting the url : %v", err)
}
defer wordList.Body.Close()
wordUrl1, err := http.Get(url1)
if err != nil {
log.Fatalf("Error while getting the url : %v", err)
}
defer wordUrl1.Body.Close()
streamList := bufio.NewScanner(wordList.Body)
streamUrl1 := bufio.NewScanner(wordUrl1.Body)
streamList.Split(bufio.ScanLines)
streamUrl1.Split(bufio.ScanLines)
var i = 0;
var j = 0;
//Fill arrays with each lines
for streamList.Scan() {
buffer[i] = streamList.Text()
i++
}
for streamUrl1.Scan() {
bufferUrl1[j] = streamUrl1.Text()
j++
}
//ERROR OCCURRING HERE :
// This code if i'm not wrong is supposed to compare through all the range of bufferUrl1 -> bufferUrl1 values with buffer values, then increment sibling and output FIND
for v := range bufferUrl1{
if stringInSlice(bufferUrl1, buffer) {
sibling++
fmt.Println("FIND")
}
}
// As a testing purpose thoses lines properly paste both array
// fmt.Println(buffer)
// fmt.Println(bufferUrl1)
}
But right now, my build doesn't even succeed. I'm only greeted with this message:
.\hello.go:69: cannot use bufferUrl1 (type [40000]string) as type string in argument to stringInSlice
.\hello.go:69: cannot use buffer (type [2000]string) as type []string in argument to stringInSlice
bufferUrl1 is an array: [4000]string. You meant to use v (each
string in bufferUrl1). But in fact, you meant to use the second
variable—the first variable is the index which is ignored in the code
below using _.
type [2000]string is different from []string. In Go, arrays and slices are not the same. Read Go Slices: usage and internals. I've changed both variable declarations to use slices with the same initial length using make.
These are changes you need to make to compile.
Declarations:
// Create storing slices
buffer := make([]string, 2000)
bufferUrl1 := make([]string, 40000)
and the loop on Line 69:
for _, s := range bufferUrl1 {
if stringInSlice(s, buffer) {
sibling++
fmt.Println("FIND")
}
}
As a side-note, consider using a map instead of a slice for buffer for more efficient lookup instead of looping through the list in stringInSlice.
https://play.golang.org/p/UcaSVwYcIw has the fix for the comments below (you won't be able to make HTTP requests from the Playground).