Let's say I have a list of student cities and the size of it could be 100 or 1000, and I want to filter out all duplicates cities.
I want a generic solution that I can use to remove all duplicate strings from any slice.
I am new to Go Language, So I tried to do it by looping and checking if the element exists using another loop function.
Students' Cities List (Data):
studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}
Functions that I created, and it's doing the job:
func contains(s []string, e string) bool {
for _, a := range s {
if a == e {
return true
}
}
return false
}
func removeDuplicates(strList []string) []string {
list := []string{}
for _, item := range strList {
fmt.Println(item)
if contains(list, item) == false {
list = append(list, item)
}
}
return list
}
My solution test
func main() {
studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}
uniqueStudentsCities := removeDuplicates(studentsCities)
fmt.Println(uniqueStudentsCities) // Expected output [Mumbai Delhi Ahmedabad Bangalore Kolkata Pune]
}
I believe that the above solution that I tried is not an optimum solution. Therefore, I need help from you guys to suggest the fastest way to remove duplicates from the slice?
I checked StackOverflow, this question is not being asked yet, so I didn't get any solution.
I found Burak's and Fazlan's solution helpful. Based on that, I implemented the simple functions that help to remove or filter duplicate data from slices of strings, integers, or any other types with generic approach.
Here are my three functions, first is generic, second one for strings and last one for integers of slices. You have to pass your data and return all the unique values as a result.
Generic solution: => Go v1.18
func removeDuplicate[T string | int](sliceList []T) []T {
allKeys := make(map[T]bool)
list := []T{}
for _, item := range sliceList {
if _, value := allKeys[item]; !value {
allKeys[item] = true
list = append(list, item)
}
}
return list
}
To remove duplicate strings from slice:
func removeDuplicateStr(strSlice []string) []string {
allKeys := make(map[string]bool)
list := []string{}
for _, item := range strSlice {
if _, value := allKeys[item]; !value {
allKeys[item] = true
list = append(list, item)
}
}
return list
}
To remove duplicate integers from slice:
func removeDuplicateInt(intSlice []int) []int {
allKeys := make(map[int]bool)
list := []int{}
for _, item := range intSlice {
if _, value := allKeys[item]; !value {
allKeys[item] = true
list = append(list, item)
}
}
return list
}
You can update the slice type, and it will filter out all duplicates data for all types of slices.
Here is the GoPlayground link: https://go.dev/play/p/iyb97KcftMa
Adding this answer which worked for me, does require/include sorting, however.
func removeDuplicateStrings(s []string) []string {
if len(s) < 1 {
return s
}
sort.Strings(s)
prev := 1
for curr := 1; curr < len(s); curr++ {
if s[curr-1] != s[curr] {
s[prev] = s[curr]
prev++
}
}
return s[:prev]
}
For fun, I tried using generics! (Go 1.18+ only)
type SliceType interface {
~string | ~int | ~float64 // add more *comparable* types as needed
}
func removeDuplicates[T SliceType](s []T) []T {
if len(s) < 1 {
return s
}
// sort
sort.SliceStable(s, func(i, j int) bool {
return s[i] < s[j]
})
prev := 1
for curr := 1; curr < len(s); curr++ {
if s[curr-1] != s[curr] {
s[prev] = s[curr]
prev++
}
}
return s[:prev]
}
Go Playground Link with tests: https://go.dev/play/p/bw1PP1osJJQ
You can do in-place replacement guided with a map:
processed := map[string]struct{}{}
w := 0
for _, s := range cities {
if _, exists := processed[s]; !exists {
// If this city has not been seen yet, add it to the list
processed[s] = struct{}{}
cities[w] = s
w++
}
}
cities = cities[:w]
reduce memory usage:
package main
import (
"fmt"
"reflect"
)
type void struct{}
func main() {
digits := [6]string{"one", "two", "three", "four", "five", "five"}
set := make(map[string]void)
for _, element := range digits {
set[element] = void{}
}
fmt.Println(reflect.ValueOf(set).MapKeys())
}
p.s. playground
Simple to understand.
func RemoveDuplicate(array []string) []string {
m := make(map[string]string)
for _, x := range array {
m[x] = x
}
var ClearedArr []string
for x, _ := range m {
ClearedArr = append(ClearedArr, x)
}
return ClearedArr
}
If you want to don't waste memory allocating another array for copy the values, you can remove in place the value, as following:
package main
import "fmt"
var studentsCities = []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}
func contains(s []string, e string) bool {
for _, a := range s {
if a == e {
return true
}
}
return false
}
func main() {
fmt.Printf("Cities before remove: %+v\n", studentsCities)
for i := 0; i < len(studentsCities); i++ {
if contains(studentsCities[i+1:], studentsCities[i]) {
studentsCities = remove(studentsCities, i)
i--
}
}
fmt.Printf("Cities after remove: %+v\n", studentsCities)
}
func remove(slice []string, s int) []string {
return append(slice[:s], slice[s+1:]...)
}
Result:
Cities before remove: [Mumbai Delhi Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]
Cities after remove: [Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]
It can also be done with a set-like map:
ddpStrings := []string{}
m := map[string]struct{}{}
for _, s := range strings {
if _, ok := m[scopeStr]; ok {
continue
}
ddpStrings = append(ddpStrings, s)
m[s] = struct{}{}
}
func UniqueNonEmptyElementsOf(s []string) []string {
unique := make(map[string]bool, len(s))
var us []string
for _, elem := range s {
if len(elem) != 0 {
if !unique[elem] {
us = append(us, elem)
unique[elem] = true
}
}
}
return us
}
send the duplicated splice to the above function, this will return the splice with unique elements.
func main() {
studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}
uniqueStudentsCities := UniqueNonEmptyElementsOf(studentsCities)
fmt.Println(uniqueStudentsCities)
}
Here's a mapless index based slice's duplicate "remover"/trimmer. It use a sort method.
The n value is always 1 value lower than the total of non duplicate elements that's because this methods compare the current (consecutive/single) elements with the next (consecutive/single) elements and there is no matches after the lasts so you have to pad it to include the last.
Note that this snippet doesn't empty the duplicate elements into a nil value. However since the n+1 integer start at the duplicated item's indexes, you can loop from said integer and nil the rest of the elements.
sort.Strings(strs)
for n, i := 0, 0; ; {
if strs[n] != strs[i] {
if i-n > 1 {
strs[n+1] = strs[i]
}
n++
}
i++
if i == len(strs) {
if n != i {
strs = strs[:n+1]
}
break
}
}
fmt.Println(strs)
Based on Riyaz's solution, you can use generics since Go 1.18
func removeDuplicate[T string | int](tSlice []T) []T {
allKeys := make(map[T]bool)
list := []T{}
for _, item := range tSlice {
if _, value := allKeys[item]; !value {
allKeys[item] = true
list = append(list, item)
}
}
return list
}
Generics minimizes code duplication.
Go Playground link : https://go.dev/play/p/Y3fEtHJpP7Q
So far #snassr has given the best answer as it is the most optimized way in terms of memory (no extra memory) and runtime (nlogn). But one thing I want to emphasis here is if we want to delete any index/element of an array we should loop from end to start as it reduces complexity. If we loop from start to end then if we delete nth index then we will accidentally miss the nth element (which was n+1th before deleting nth element) as in the next iteration we will get the n+1th element.
Example Code
func Dedup(strs []string) {
sort.Strings(strs)
for i := len(strs) - 1; i > 0; i-- {
if strs[i] == strs[i-1] {
strs = append(strs[:i], strs[i+1:]...)
}
}
}
try: https://github.com/samber/lo#uniq
names := lo.Uniq[string]([]string{"Samuel", "John", "Samuel"})
// []string{"Samuel", "John"}
What I'm trying to do:
Transform all arrays of length 1 in a JSON file to non arrays.
e.g.
Input: {"path": [{"secret/foo": [{"capabilities": ["read"]}]}]}
Output: {"path": {"secret/foo": {"capabilities": "read"}}}
I can't use Structs as the JSON format will vary...
Right now I've managed to at least detect the 1 length slices:
package main
import (
"encoding/json"
"fmt"
)
func findSingletons(value interface{}) {
switch value.(type) {
case []interface{}:
if len(value.([]interface{})) == 1 {
fmt.Println("1 length array found!", value)
}
for _, v := range value.([]interface{}) {
findSingletons(v)
}
case map[string]interface{}:
for _, v := range value.(map[string]interface{}) {
findSingletons(v)
}
}
}
func removeSingletonsFromJSON(input string) {
jsonFromInput := json.RawMessage(input)
jsonMap := make(map[string]interface{})
err := json.Unmarshal([]byte(jsonFromInput), &jsonMap)
if err != nil {
panic(err)
}
findSingletons(jsonMap)
fmt.Printf("JSON value of without singletons:%s\n", jsonMap)
}
func main() {
jsonParsed := []byte(`{"path": [{"secret/foo": [{"capabilities": ["read"]}]}]}`)
removeSingletonsFromJSON(string(jsonParsed))
fmt.Println(`Should have output {"path": {"secret/foo": {"capabilities": "read"}}}`)
}
Which outputs
1 length array found! [map[secret/foo:[map[capabilities:[read]]]]]
1 length array found! [map[capabilities:[read]]]
1 length array found! [read]
JSON value of without singletons:map[path:[map[secret/foo:[map[capabilities:[read]]]]]]
Should have output {"path": {"secret/foo": {"capabilities": "read"}}}
But I'm not sure how I can change them into non-arrays...
The type switch is your friend:
switch t := v.(type) {
case []interface{}:
if len(t) == 1 {
data[k] = t[0]
And you may use recursion to remove inside elements, like so:
func removeOneElementSlice(data map[string]interface{}) {
for k, v := range data {
switch t := v.(type) {
case []interface{}:
if len(t) == 1 {
data[k] = t[0]
if v, ok := data[k].(map[string]interface{}); ok {
removeOneElementSlice(v)
}
}
}
}
}
I would do this to convert
{"path":[{"secret/foo":[{"capabilities":["read"]}]}]}
to
{"path":{"secret/foo":{"capabilities":"read"}}}:
package main
import (
"encoding/json"
"fmt"
"log"
)
func main() {
s := `{"path":[{"secret/foo":[{"capabilities":["read"]}]}]}`
fmt.Println(s)
var data map[string]interface{}
if err := json.Unmarshal([]byte(s), &data); err != nil {
panic(err)
}
removeOneElementSlice(data)
buf, err := json.Marshal(data)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(buf)) //{"a":"a","n":7}
}
func removeOneElementSlice(data map[string]interface{}) {
for k, v := range data {
switch t := v.(type) {
case []interface{}:
if len(t) == 1 {
data[k] = t[0]
if v, ok := data[k].(map[string]interface{}); ok {
removeOneElementSlice(v)
}
}
}
}
}
var bar string
var i int
var a []string
for foo, _ := reader.NextWord(); foo != bar; foo, _ = reader.NextWord() {
bar = foo
fmt.Print(foo)
a[i] = foo
i++
}
Shouldn't this be creating a nil slice and then adding the value to the appropriate place? I keep getting index out of range so I assume it's not adding to a[i]...
Checking length first with
if len(a) > 0 {
a[i] = foo
}
seems to help, but not getting the results I expected. I'll keep playing around.
Update: I did end up using append... I meant to update this thread but thank you both.
package main
import (
"fmt"
"log"
"os"
"strings"
"github.com/steven-ferrer/gonsole"
)
func main() {
file, err := os.Open("test.txt")
if err != nil {
log.Println(err)
}
defer file.Close()
reader := gonsole.NewReader(file)
// cycle through
var bar string
var i int
var quality []string = make([]string, 0)
var tempName []string = make([]string, 0)
var name []string = make([]string, 0)
for foo, _ := reader.NextWord(); foo != bar; foo, _ = reader.NextWord() {
bar = foo
if strings.Contains(foo, "(normal)") {
quality = append(quality, "normal")
for state := 0; state < 1; foo, _ = reader.NextWord() {
if foo == "|" {
state = 1
}
tempName = append(tempName, foo)
}
nameString := strings.Join(tempName, "")
name = append(name, nameString)
} else if strings.Contains(foo, "(unique)") {
quality = append(quality, "unique")
for state := 0; state < 1; foo, _ = reader.NextWord() {
if foo == "|" {
state = 1
}
tempName = append(tempName, foo)
}
nameString := strings.Join(tempName, "")
name = append(name, nameString)
} else if strings.Contains(foo, "(set)") {
quality = append(quality, "set")
for state := 0; state < 1; foo, _ = reader.NextWord() {
if foo == "|" {
state = 1
}
tempName = append(tempName, foo)
}
nameString := strings.Join(tempName, "")
name = append(name, nameString)
}
if tempName != nil {
tempName = nil // clear tempName
}
i++
}
Your slice a needs to be allocated utilizing make.
var a []string = make([]string, n)
where n is the size of the slice.
Removing some of the context-specific parts of your code, you should be using the append method against a dynamic-length slice.
package main
import (
"fmt"
"strings"
)
func main() {
book := "Lorem ipsum dolor sit amet"
var words []string
for _, word := range strings.Split(book, " ") {
words = append(words, word)
}
fmt.Printf("%+v\n", words)
}
https://play.golang.org/p/LMejsrmIGb9
If you know the number of values up front, the same can be achieved for a fixed length slice by using words := make([]string, 5), but I doubt this is what you want in this case.
The reason your code is causing you errors is because your slice isn't initialized at any given length, so your indexes don't yet exist. Generally when working with a slice, append is the method you want.
Opposite to this, when working with existing slices (ie, rangeing an slice), the reason you're able to set the values using indexes is because the index has already been allocated.
I need to compute the hash (md5 is ok) for a large number of files. So, in Go I have this code:
package main
import (
"io"
"os"
"fmt"
"path/filepath"
"crypto/md5"
"encoding/hex"
)
func strSliceRemove(slice []string, str string) []string {
var tempSlice []string;
for _, item := range slice {
if item != str {
tempSlice = append(tempSlice, item)
}
}
return tempSlice
}
func fileMD5(path string) (string, error) {
var returnMD5String string
file, err := os.Open(path)
if err != nil {
return returnMD5String, err
}
defer file.Close()
hash := md5.New()
if _, err := io.Copy(hash, file); err != nil {
return returnMD5String, err
}
hashInBytes := hash.Sum(nil)[:16]
returnMD5String = hex.EncodeToString(hashInBytes)
return returnMD5String, nil
}
func main() {
var doRead func(string)
doRead = func(sd string) {
filepath.Walk(sd, func(path string, f os.FileInfo, err error) error {
resolvedPath, resolvedPathErr := filepath.EvalSymlinks(path)
if resolvedPathErr != nil {
return nil
}
if f.Mode()&os.ModeSymlink == os.ModeSymlink {
doRead(resolvedPath)
} else {
if !f.IsDir() {
md5, _ := fileMD5(path)
fmt.Printf("%s\n", md5)
}
}
return nil
})
}
doRead("/tmp/electron")
return
}
It hashes correctly 1400 files in almost one second. If I use my OSX md5 command line utility, it takes more than 10 times the time. It is 10 times slower:
for FILE in `find /tmp/electron`; do
if [ ! -d "$FILE" ]; then
md5 $FILE;
fi;
done;
I tried a basic c program that does the same (based on this answer How to calculate the MD5 hash of a large file in C?) and still the time seems more or less 10 seconds.
What kind of strategy / library does crypto/md5 use?
I try to read a directory and make a JSON string out of the file entries. But the json.encoder.Encode() function returns only empty objects. For test I have two files in a tmp dir:
test1.js test2.js
The go program is this:
package main
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"time"
)
type File struct {
name string
timeStamp int64
}
func main() {
files := make([]File, 0, 20)
filepath.Walk("/home/michael/tmp/", func(path string, f os.FileInfo, err error) error {
if f == nil {
return nil
}
name := f.Name()
if len(name) > 3 {
files = append(files, File{
name: name,
timeStamp: f.ModTime().UnixNano() / int64(time.Millisecond),
})
// grow array if needed
if cap(files) == len(files) {
newFiles := make([]File, len(files), cap(files)*2)
for i := range files {
newFiles[i] = files[i]
}
files = newFiles
}
}
return nil
})
fmt.Println(files)
encoder := json.NewEncoder(os.Stdout)
encoder.Encode(&files)
}
And the out that it produces is:
[{test1.js 1444549471481} {test2.js 1444549481017}]
[{},{}]
Why is the JSON string empty?
It doesn't work because none of the fields in the File struct are exported.
The following works just fine:
package main
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"time"
)
type File struct {
Name string
TimeStamp int64
}
func main() {
files := make([]File, 0, 20)
filepath.Walk("/tmp/", func(path string, f os.FileInfo, err error) error {
if f == nil {
return nil
}
name := f.Name()
if len(name) > 3 {
files = append(files, File{
Name: name,
TimeStamp: f.ModTime().UnixNano() / int64(time.Millisecond),
})
// grow array if needed
if cap(files) == len(files) {
newFiles := make([]File, len(files), cap(files)*2)
for i := range files {
newFiles[i] = files[i]
}
files = newFiles
}
}
return nil
})
fmt.Println(files)
encoder := json.NewEncoder(os.Stdout)
encoder.Encode(&files)
}