Golang: calculate diff between two array of bytes and patch an array - arrays

I'm trying to find the difference between two byte ararys and store the delta.
I've read this documentation https://golang.org/pkg/bytes/ but I didn't find anything that show how to find the diff.
Thanks.

Sounds like you just want a function which takes two byte slices and returns a new slice containing the difference of each element in the input slice. The example function below asserts that the input slices are both non-nil and have the same length. It also returns a slice of int16s since the range of difference in bytes is [-255,255].
package main
import "fmt"
func main() {
bs1 := []byte{0, 2, 255, 0}
bs2 := []byte{0, 1, 0, 255}
delta, err := byteDiff(bs1, bs2)
if err != nil {
panic(err)
}
fmt.Printf("OK: delta=%v\n", delta)
// OK: delta=[0 1 255 -255]
}
func byteDiff(bs1, bs2 []byte) ([]int16, error) {
// Ensure that we have two non-nil slices with the same length.
if (bs1 == nil) || (bs2 == nil) {
return nil, fmt.Errorf("expected a byte slice but got nil")
}
if len(bs1) != len(bs2) {
return nil, fmt.Errorf("mismatched lengths, %d != %d", len(bs1), len(bs2))
}
// Populate and return the difference between the two.
diff := make([]int16, len(bs1))
for i := range bs1 {
diff[i] = int16(bs1[i]) - int16(bs2[i])
}
return diff, nil
}

Related

Efficient way of flattening a recursive data structure in golang

I have a recursive data structure that can contain a few different type of data:
type Data interface{
// Some methods
}
type Pair struct { // implements Data
fst Data
snd Data
}
type Number float64 // implements Data
Now I want to flatten a chain of Pairs into a []Data. However, the Data in the fst field should not be flattened, only data in snd should be flattened. E.g:
chain := Pair{Number(1.0), Pair{Number(2.0), Pair{Number(3.0), nil}}}
chain2 := Pair{Pair{Number(1.0), Number(4.0)}, Pair{Number(2.0), Pair{Number(3.0), nil}}}
becomes:
data := []Data{Number(1.0), Number(2.0), Number(3.0)}
data2 := []Data{Pair{Number(1.0), Number(4.0)}, Number(2.0), Number(3.0)}
My naive approach would be:
var data []Data
chain := Pair{Number(1.0), Pair{Number(2.0), Pair{Number(3.0), nil}}}
for chain != nil {
data = append(data, chain.fst)
chain = chain.snd
}
Is there a more efficient approach that can flatten a data structure like the one in the variable chain into an []Data array?
You can use a recursive function. On the way down, add up the number of pairs, at the bottom, allocate the array, and on the way back up, fill the array from back to front.
If you need to support arbitrary trees, you can add a size method to Data, and then do another tree traversal to actually fill the array.
Huh, your naive approach doesn't work for Pairs nested inside fst. If you had chain := Pair{Pair{Number(1.0), Number(2.0)}, Number{3.0}}, it would end up as []Data{Pair{Number(1.0), Number(2.0)}, Number{3.0}}. This is an inherently recursive problem, so why not implement it as such?
I suggest adding a flatten() method to your interface. Pairs can just recursively nest themselves, and Numbers just return their value.
Here's a fully working example with some minimal testing:
package main
import "fmt"
type Data interface {
flatten() []Data
}
type Pair struct {
fst Data
snd Data
}
type Number float64
func (p Pair) flatten() []Data {
res := []Data{}
if p.fst != nil {
res = append(res, p.fst.flatten()...)
}
if p.snd != nil {
res = append(res, p.snd.flatten()...)
}
return res
}
func (n Number) flatten() []Data {
return []Data{n}
}
func main() {
tests := []Data{
Pair{Number(1.0), Pair{Number(2.0), Pair{Number(3.0), nil}}},
Pair{Pair{Number(1.0), Number(2.0)}, Number(3.0)},
Pair{Pair{Pair{Number(1.0), Number(2.0)}, Pair{Number(3.0), Number(4.0)}}, Pair{Pair{Number(5.0), Number(6.0)}, Number(7.0)}},
Number(1.0),
}
for _, t := range tests {
fmt.Printf("Original: %v\n", t)
fmt.Printf("Flattened: %v\n", t.flatten())
}
}
(This assumes that the top-level input Data is never nil).
The code prints:
Original: {1 {2 {3 <nil>}}}
Flattened: [1 2 3]
Original: {{1 2} 3}
Flattened: [1 2 3]
Original: {{{1 2} {3 4}} {{5 6} 7}}
Flattened: [1 2 3 4 5 6 7]
Original: 1
Flattened: [1]
As suggested, writing a recursive function fits best for this problem. But it's also possible to write a non-recursive version (IMHO recursive version would be more clear):
func flatten(d Data) []Data {
var res []Data
stack := []Data{d}
for {
if len(stack) == 0 {
break
}
switch x := stack[len(stack)-1].(type) {
case Pair:
stack[len(stack)-1] = x.snd
stack = append(stack, x.fst)
case Number:
res = append(res, x)
stack = stack[:len(stack)-1]
default:
if x == nil {
stack = stack[:len(stack)-1]
} else {
panic("INVALID TYPE")
}
}
}
return res
}

GoLang: Check if item from Slice 1 contains in Slice 2. If it does, remove Slice 2

I have a string array: slice1 [][]string.
I get the values I want using a for loop:
for _, i := range slice1 { //[string1 string2]
fmt.Println("server: ", i[1]) //only want the second string in the array.
}
Now I have another string array: slice2 [][]string
I get its values using a for loop as well:
for _, value := range output { //
fmt.Println(value) //Prints: [ 200K, 2, "a", 22, aa-d-2, sd , MatchingString, a ]
}
I want to iterate through slice1 and check if the string2 matches "MatchingString" in Slice2. If it does, don't print the value array.
I created a for loop again to do this but its not working:
for _, value := range slice2 {
for _, i := range slice1 {
if strings.Contains(value[0], i[1]) {
//skip over
} else {
fmt.Println(value)
}
}
}
Here's a sample code: https://play.golang.org/p/KMVzB2jlbG
Any idea on how to do this? Thanks!
If I'm reading your question correctly, you are trying to print all those subslices of slice2 that have the property that none of the strings within are the second element of a slice in slice1. If so, you can obtain that through
Slice2Loop:
for _, value := range slice2 {
for _, slice2string := range value {
for _, i := range slice1 {
if slice2string == i[1] {
continue Slice2Loop
}
}
}
fmt.Println(value)
}

Most idiomatic way to select elements from an array in Golang?

I have an array of strings, and I'd like to exclude values that start in foo_ OR are longer than 7 characters.
I can loop through each element, run the if statement, and add it to a slice along the way. But I was curious if there was an idiomatic or more golang-like way of accomplishing that.
Just for example, the same thing might be done in Ruby as
my_array.select! { |val| val !~ /^foo_/ && val.length <= 7 }
There is no one-liner as you have it in Ruby, but with a helper function you can make it almost as short.
Here's our helper function that loops over a slice, and selects and returns only the elements that meet a criteria captured by a function value:
func filter(ss []string, test func(string) bool) (ret []string) {
for _, s := range ss {
if test(s) {
ret = append(ret, s)
}
}
return
}
Starting with Go 1.18, we can write it generic so it will work with all types, not just string:
func filter[T any](ss []T, test func(T) bool) (ret []T) {
for _, s := range ss {
if test(s) {
ret = append(ret, s)
}
}
return
}
Using this helper function your task:
ss := []string{"foo_1", "asdf", "loooooooong", "nfoo_1", "foo_2"}
mytest := func(s string) bool { return !strings.HasPrefix(s, "foo_") && len(s) <= 7 }
s2 := filter(ss, mytest)
fmt.Println(s2)
Output (try it on the Go Playground, or the generic version: Go Playground):
[asdf nfoo_1]
Note:
If it is expected that many elements will be selected, it might be profitable to allocate a "big" ret slice beforehand, and use simple assignment instead of the append(). And before returning, slice the ret to have a length equal to the number of selected elements.
Note #2:
In my example I chose a test() function which tells if an element is to be returned. So I had to invert your "exclusion" condition. Obviously you may write the helper function to expect a tester function which tells what to exclude (and not what to include).
Have a look at robpike's filter library. This would allow you to do:
package main
import (
"fmt"
"strings"
"filter"
)
func isNoFoo7(a string) bool {
return ! strings.HasPrefix(a, "foo_") && len(a) <= 7
}
func main() {
a := []string{"test", "some_other_test", "foo_etc"}
result := Choose(a, isNoFoo7)
fmt.Println(result) // [test]
}
Interestingly enough the README.md by Rob:
I wanted to see how hard it was to implement this sort of thing in Go, with as nice an API as I could manage. It wasn't hard.
Having written it a couple of years ago, I haven't had occasion to use it once. Instead, I just use "for" loops.
You shouldn't use it either.
So the most idiomatic way according to Rob would be something like:
func main() {
a := []string{"test", "some_other_test", "foo_etc"}
nofoos := []string{}
for i := range a {
if(!strings.HasPrefix(a[i], "foo_") && len(a[i]) <= 7) {
nofoos = append(nofoos, a[i])
}
}
fmt.Println(nofoos) // [test]
}
This style is very similar, if not identical, to the approach any C-family language takes.
Today, I stumbled on a pretty idiom that surprised me. If you want to filter a slice in place without allocating, use two slices with the same backing array:
s := []T{
// the input
}
s2 := s
s = s[:0]
for _, v := range s2 {
if shouldKeep(v) {
s = append(s, v)
}
}
Here's a specific example of removing duplicate strings:
s := []string{"a", "a", "b", "c", "c"}
s2 := s
s = s[:0]
var last string
for _, v := range s2 {
if len(s) == 0 || v != last {
last = v
s = append(s, v)
}
}
If you need to keep both slices, simply replace s = s[:0] with s = nil or s = make([]T, 0, len(s)), depending on whether you want append() to allocate for you.
There are a couple of nice ways to filter a slice without allocations or new dependencies. Found in the Go wiki on Github:
Filter (in place)
n := 0
for _, x := range a {
if keep(x) {
a[n] = x
n++
}
}
a = a[:n]
And another, more readable, way:
Filtering without allocating
This trick uses the fact that a slice shares the same backing array
and capacity as the original, so the storage is reused for the
filtered slice. Of course, the original contents are modified.
b := a[:0]
for _, x := range a {
if f(x) {
b = append(b, x)
}
}
For elements which must be garbage collected, the following code can
be included afterwards:
for i := len(b); i < len(a); i++ {
a[i] = nil // or the zero value of T
}
One thing I'm not sure about is whether the first method needs clearing (setting to nil) the items in slice a after index n, like they do in the second method.
EDIT: the second way is basically what MicahStetson described in his answer. In my code I use a function similar to the following, which is probably as good as it gets in terms on performance and readability:
func filterSlice(slice []*T, keep func(*T) bool) []*T {
newSlice := slice[:0]
for _, item := range slice {
if keep(item) {
newSlice = append(newSlice, item)
}
}
// make sure discarded items can be garbage collected
for i := len(newSlice); i < len(slice); i++ {
slice[i] = nil
}
return newSlice
}
Note that if items in your slice are not pointers and don't contain pointers you can skip the second for loop.
There isn't an idiomatic way you can achieve the same expected result in Go in one single line as in Ruby, but with a helper function you can obtain the same expressiveness as in Ruby.
You can call this helper function as:
Filter(strs, func(v string) bool {
return strings.HasPrefix(v, "foo_") // return foo_testfor
}))
Here is the whole code:
package main
import "strings"
import "fmt"
// Returns a new slice containing all strings in the
// slice that satisfy the predicate `f`.
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) && len(v) > 7 {
vsf = append(vsf, v)
}
}
return vsf
}
func main() {
var strs = []string{"foo1", "foo2", "foo3", "foo3", "foo_testfor", "_foo"}
fmt.Println(Filter(strs, func(v string) bool {
return strings.HasPrefix(v, "foo_") // return foo_testfor
}))
}
And the running example: Playground
you can use the loop as you did and wrap it to a utils function for reuse.
For multi-datatype support, copy-paste will be a choice. Another choice is writing a generating tool.
And final option if you want to use lib, you can take a look on https://github.com/ledongthuc/goterators#filter that I created to reuse aggregate & transform functions.
It requires the Go 1.18 to use that support generic + dynamic type you want to use with.
filteredItems, err := Filter(list, func(item int) bool {
return item % 2 == 0
})
filteredItems, err := Filter(list, func(item string) bool {
return item.Contains("ValidWord")
})
filteredItems, err := Filter(list, func(item MyStruct) bool {
return item.Valid()
})
It also supports Reduce in case you want to optimize the way you select.
Hope it's useful with you!
"Select Elements from Array" is also commonly called a filter function. There's no such thing in go. There are also no other "Collection Functions" such as map or reduce. For the most idiomatic way to get the desired result, I find https://gobyexample.com/collection-functions a good reference:
[...] in Go it’s common to provide collection functions if and when they are specifically needed for your program and data types.
They provide an implementation example of the filter function for strings:
func Filter(vs []string, f func(string) bool) []string {
vsf := make([]string, 0)
for _, v := range vs {
if f(v) {
vsf = append(vsf, v)
}
}
return vsf
}
However, they also say, that it's often ok to just inline the function:
Note that in some cases it may be clearest to just inline the
collection-manipulating code directly, instead of creating and calling
a helper function.
In general, golang tries to only introduce orthogonal concepts, meaning that when you can solve a problem one way, there shouldn't be too many more ways to solve it. This adds simplicity to the language by only having a few core concepts, such that not every developer uses a different subset of the language.
Take a look at this library: github.com/thoas/go-funk
It provides an implementation of a lot of life-saving idioms in Go (including filtering of elements in array for instance).
r := funk.Filter([]int{1, 2, 3, 4}, func(x int) bool {
return x%2 == 0
}
Here is an elegant example of both Fold and Filter that uses recursion to accomplish filtering. FoldRight is also generally useful. It is not stack safe but could be made so with trampolining. Once Golang has generics it can be entirely generalized for any 2 types:
func FoldRightStrings(as, z []string, f func(string, []string) []string) []string {
if len(as) > 1 {//Slice has a head and a tail.
h, t := as[0], as[1:len(as)]
return f(h, FoldRightStrings(t, z, f))
} else if len(as) == 1 {//Slice has a head and an empty tail.
h := as[0]
return f(h, FoldRightStrings([]string{}, z, f))
}
return z
}
func FilterStrings(as []string, p func(string) bool) []string {
var g = func(h string, accum []string) []string {
if p(h) {
return append(accum, h)
} else {
return accum
}
}
return FoldRightStrings(as, []string{}, g)
}
Here is an example of its usage to filter out all the strings with length < 8
var p = func(s string) bool {
if len(s) < 8 {
return true
} else {
return false
}
}
FilterStrings([]string{"asd","asdfas","asdfasfsa","asdfasdfsadfsadfad"}, p)
I`m developing this library: https://github.com/jose78/go-collection. PLease try this example to filter elements:
package main
import (
"fmt"
col "github.com/jose78/go-collection/collections"
)
type user struct {
name string
age int
id int
}
func main() {
newMap := generateMapTest()
if resultMap, err := newMap.FilterAll(filterEmptyName); err != nil {
fmt.Printf("error")
} else {
fmt.Printf("Result: %v\n", resultMap)
result := resultMap.ListValues()
fmt.Printf("Result: %v\n", result)
fmt.Printf("Result: %v\n", result.Reverse())
fmt.Printf("Result: %v\n", result.JoinAsString(" <---> "))
fmt.Printf("Result: %v\n", result.Reverse().JoinAsString(" <---> "))
result.Foreach(simpleLoop)
err := result.Foreach(simpleLoopWithError)
if err != nil {
fmt.Println(err)
}
}
}
func filterEmptyName(key interface{}, value interface{}) bool {
user := value.(user)
return user.name != "empty"
}
func generateMapTest() (container col.MapType) {
container = col.MapType{}
container[1] = user{"Alvaro", 6, 1}
container[2] = user{"Sofia", 3, 2}
container[3] = user{"empty", 0, -1}
return container
}
var simpleLoop col.FnForeachList = func(mapper interface{}, index int) {
fmt.Printf("%d.- item:%v\n", index, mapper)
}
var simpleLoopWithError col.FnForeachList = func(mapper interface{}, index int) {
if index > 0 {
panic(fmt.Sprintf("Error produced with index == %d\n", index))
}
fmt.Printf("%d.- item:%v\n", index, mapper)
}
Result of execution:
Result: map[1:{Alvaro 6 1} 2:{Sofia 3 2}]
Result: [{Sofia 3 2} {Alvaro 6 1}]
Result: [{Alvaro 6 1} {Sofia 3 2}]
Result: {Sofia 3 2} <---> {Alvaro 6 1}
Result: {Alvaro 6 1} <---> {Sofia 3 2}
0.- item:{Sofia 3 2}
1.- item:{Alvaro 6 1}
0.- item:{Sofia 3 2}
Recovered in f Error produced with index == 1
ERROR: Error produced with index == 1
Error produced with index == 1
The DOC currently are located in wiki section of the project. You can try it in this link. I hope you like it...
REgaRDS...

Count similar array value

I'm trying to learn Go (or Golang) and can't seem to get it right. I have 2 texts files, each containing a list of words. I'm trying to count the amount of words that are present in both files.
Here is my code so far :
package main
import (
"fmt"
"log"
"net/http"
"bufio"
)
func stringInSlice(str string, list []string) bool {
for _, v := range list {
if v == str {
return true
}
}
return false
}
func main() {
// Texts URL
var list = "https://gist.githubusercontent.com/alexcesaro/c9c47c638252e21bd82c/raw/bd031237a56ae6691145b4df5617c385dffe930d/list.txt"
var url1 = "https://gist.githubusercontent.com/alexcesaro/4ebfa5a9548d053dddb2/raw/abb8525774b63f342e5173d1af89e47a7a39cd2d/file1.txt"
//Create storing arrays
var buffer [2000]string
var bufferUrl1 [40000]string
// Set a sibling counter
var sibling = 0
// Read and store text files
wordList, err := http.Get(list)
if err != nil {
log.Fatalf("Error while getting the url : %v", err)
}
defer wordList.Body.Close()
wordUrl1, err := http.Get(url1)
if err != nil {
log.Fatalf("Error while getting the url : %v", err)
}
defer wordUrl1.Body.Close()
streamList := bufio.NewScanner(wordList.Body)
streamUrl1 := bufio.NewScanner(wordUrl1.Body)
streamList.Split(bufio.ScanLines)
streamUrl1.Split(bufio.ScanLines)
var i = 0;
var j = 0;
//Fill arrays with each lines
for streamList.Scan() {
buffer[i] = streamList.Text()
i++
}
for streamUrl1.Scan() {
bufferUrl1[j] = streamUrl1.Text()
j++
}
//ERROR OCCURRING HERE :
// This code if i'm not wrong is supposed to compare through all the range of bufferUrl1 -> bufferUrl1 values with buffer values, then increment sibling and output FIND
for v := range bufferUrl1{
if stringInSlice(bufferUrl1, buffer) {
sibling++
fmt.Println("FIND")
}
}
// As a testing purpose thoses lines properly paste both array
// fmt.Println(buffer)
// fmt.Println(bufferUrl1)
}
But right now, my build doesn't even succeed. I'm only greeted with this message:
.\hello.go:69: cannot use bufferUrl1 (type [40000]string) as type string in argument to stringInSlice
.\hello.go:69: cannot use buffer (type [2000]string) as type []string in argument to stringInSlice
bufferUrl1 is an array: [4000]string. You meant to use v (each
string in bufferUrl1). But in fact, you meant to use the second
variable—the first variable is the index which is ignored in the code
below using _.
type [2000]string is different from []string. In Go, arrays and slices are not the same. Read Go Slices: usage and internals. I've changed both variable declarations to use slices with the same initial length using make.
These are changes you need to make to compile.
Declarations:
// Create storing slices
buffer := make([]string, 2000)
bufferUrl1 := make([]string, 40000)
and the loop on Line 69:
for _, s := range bufferUrl1 {
if stringInSlice(s, buffer) {
sibling++
fmt.Println("FIND")
}
}
As a side-note, consider using a map instead of a slice for buffer for more efficient lookup instead of looping through the list in stringInSlice.
https://play.golang.org/p/UcaSVwYcIw has the fix for the comments below (you won't be able to make HTTP requests from the Playground).

How to read a file starting from a specific line number using Scanner?

I am new to Go and I am trying to write a simple script that reads a file line by line. I also want to save the progress (i.e. the last line number that was read) on the filesystem somewhere so that if the same file was given as the input to the script again, it starts reading the file from the line where it left off. Following is what I have started off with.
package main
// Package Imports
import (
"bufio"
"flag"
"fmt"
"log"
"os"
)
// Variable Declaration
var (
ConfigFile = flag.String("configfile", "../config.json", "Path to json configuration file.")
)
// The main function that reads the file and parses the log entries
func main() {
flag.Parse()
settings := NewConfig(*ConfigFile)
inputFile, err := os.Open(settings.Source)
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
// Saves the current progress
func SaveProgress() {
}
// Get the line count from the progress to make sure
func GetCounter() {
}
I could not find any methods that deals with line numbers in the scanner package. I know I can declare an integer say counter := 0 and increment it each time a line is read like counter++. But the next time how do I tell the scanner to start from a specific line? So for example if I read till line 30 the next time I run the script with the same input file, how can I make scanner to start reading from line 31?
Update
One solution I can think of here is to use the counter as I stated above and use an if condition like the following.
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
if counter > progress {
fmt.Println(scanner.Text())
}
}
I am pretty sure something like this would work, but it is still going to loop over the lines that we have already read. Please suggest a better way.
If you don't want to read but just skip the lines you read previously, you need to acquire the position where you left off.
The different solutions are presented in a form of a function which takes the input to read from and the start position (byte position) to start reading lines from, e.g.:
func solution(input io.ReadSeeker, start int64) error
A special io.Reader input is used which also implements io.Seeker, the common interface which allows skipping data without having to read them. *os.File implements this, so you are allowed to pass a *File to these functions. Good. The "merged" interface of both io.Reader and io.Seeker is io.ReadSeeker.
If you want a clean start (to start reading from the beginning of the file), simply pass start = 0. If you want to resume a previous processing, pass the byte position where the last processing was stopped/aborted. This position is the value of the pos local variable in the functions (solutions) below.
All the examples below with their testing code can be found on the Go Playground.
1. With bufio.Scanner
bufio.Scanner does not maintain the position, but we can very easily extend it to maintain the position (the read bytes), so when we want to restart next, we can seek to this position.
In order to do this with minimal effort, we can use a new split function which splits the input into tokens (lines). We can use Scanner.Split() to set the splitter function (the logic to decide where are the boundaries of tokens/lines). The default split function is bufio.ScanLines().
Let's take a look at the split function declaration: bufio.SplitFunc
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
It returns the number of bytes to advance: advance. Exactly what we need to maintain the file position. So we can create a new split function using the builtin bufio.ScanLines(), so we don't even have to implement its logic, just use the advance return value to maintain position:
func withScanner(input io.ReadSeeker, start int64) error {
fmt.Println("--SCANNER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
scanner := bufio.NewScanner(input)
pos := start
scanLines := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEOF)
pos += int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan() {
fmt.Printf("Pos: %d, Scanned: %s\n", pos, scanner.Text())
}
return scanner.Err()
}
2. With bufio.Reader
In this solution we use the bufio.Reader type instead of the Scanner. bufio.Reader already has a ReadBytes() method which is very similar to the "read a line" functionality if we pass the '\n' byte as the delimeter.
This solution is similar to JimB's, with the addition of handling all valid line terminator sequences and also stripping them off from the read line (it is very rare they are needed); in regular expression notation, it is \r?\n.
func withReader(input io.ReadSeeker, start int64) error {
fmt.Println("--READER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
r := bufio.NewReader(input)
pos := start
for {
data, err := r.ReadBytes('\n')
pos += int64(len(data))
if err == nil || err == io.EOF {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[:len(data)-1]
}
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[:len(data)-1]
}
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
}
if err != nil {
if err != io.EOF {
return err
}
break
}
}
return nil
}
Note: If the content ends with an empty line (line terminator), this solution will process an empty line. If you don't want this, you can simply check it like this:
if len(data) != 0 {
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
} else {
// Last line is empty, omit it
}
Testing the solutions:
Testing code will simply use the content "first\r\nsecond\nthird\nfourth" which contains multiple lines with varying line terminating. We will use strings.NewReader() to obtain an io.ReadSeeker whose source is a string.
Test code first calls withScanner() and withReader() passing 0 start position: a clean start. In the next round we will pass a start position of start = 14 which is the position of the 3. line, so we won't see the first 2 lines processed (printed): resume simulation.
func main() {
const content = "first\r\nsecond\nthird\nfourth"
if err := withScanner(strings.NewReader(content), 0); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 0); err != nil {
fmt.Println("Reader error:", err)
}
if err := withScanner(strings.NewReader(content), 14); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 14); err != nil {
fmt.Println("Reader error:", err)
}
}
Output:
--SCANNER, start: 0
Pos: 7, Scanned: first
Pos: 14, Scanned: second
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 0
Pos: 7, Read: first
Pos: 14, Read: second
Pos: 20, Read: third
Pos: 26, Read: fourth
--SCANNER, start: 14
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 14
Pos: 20, Read: third
Pos: 26, Read: fourth
Try the solutions and testing code on the Go Playground.
Instead of using a Scanner, use a bufio.Reader, specifically the ReadBytes or ReadString methods. This way you can read up to each line termination, and still receive the full line with line endings.
r := bufio.NewReader(inputFile)
var line []byte
fPos := 0 // or saved position
for i := 1; ; i++ {
line, err = r.ReadBytes('\n')
fmt.Printf("[line:%d pos:%d] %q\n", i, fPos, line)
if err != nil {
break
}
fPos += len(line)
}
if err != io.EOF {
log.Fatal(err)
}
You can store the combination of file position and line number however you choose, and the next time you start, you use inputFile.Seek(fPos, os.SEEK_SET) to move to where you left off.
If you want to use Scanner you have go trough the begging of the file till you find GetCounter() end-line symbols.
scanner := bufio.NewScanner(inputFile)
// context line above
// skip first GetCounter() lines
for i := 0; i < GetCounter(); i++ {
scanner.Scan()
}
// context line below
for scanner.Scan() {
fmt.Println(scanner.Text())
}
Alternatively you could store offset instead of line number in the counter but remember that termination token is stripped when using Scanner and for new line the token is \r?\n (regexp notation) so it isn't clear if you should add 1 or 2 to the text length:
// Not clear how to store offset unless custom SplitFunc provided
inputFile.Seek(GetCounter(), 0)
scanner := bufio.NewScanner(inputFile)
So it is better to use previous solution or not using Scanner at all.
There's a lot of words in the other answers, and they're not really reusable code so here's a re-usable function that seeks to the given line number & returns it and the offset where the line starts. play.golang
func SeekToLine(r io.Reader, lineNo int) (line []byte, offset int, err error) {
s := bufio.NewScanner(r)
var pos int
s.Split(func(data []byte, atEof bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEof)
pos += advance
return advance, token, err
})
for i := 0; i < lineNo; i++ {
offset = pos
if !s.Scan() {
return nil, 0, io.EOF
}
}
return s.Bytes(), pos, nil
}

Resources