Go: Reading a specific range of lines in a file

Go: Reading a specific range of lines in a file - file

I mainly need to read a specific range of lines in a file, and if a string is matched to an index string (let's say "Hello World!" for example) return true, but I'm not sure how to do so. I know how read individual lines and whole files, but not ranges of lines. Are there any libraries out there that can assist, or there a simple script to do it w/? Any help is greatly appreciated!

Something like this?
package main
import (
"bufio"
"bytes"
"fmt"
"os"
)
func Find(fname string, from, to int, needle []byte) (bool, error) {
f, err := os.Open(fname)
if err != nil {
return false, err
}
defer f.Close()
n := 0
scanner := bufio.NewScanner(f)
for scanner.Scan() {
n++
if n < from {
continue
}
if n > to {
break
}
if bytes.Index(scanner.Bytes(), needle) >= 0 {
return true, nil
}
}
return false, scanner.Err()
}
func main() {
found, err := Find("test.file", 18, 27, []byte("Hello World"))
fmt.Println(found, err)
}

If you're using for to iterate through a slice of lines, you could use something along the lines of
for _,line := range file[2:40] {
// do stuff
}

Related

How to chunk a file into 4 equal files

I have a file of huge size for example 100MB, I need to chunk it into 4 25MB files using golang.
The thing here is, if i use go routine and read the file, the order of the data inside the files are not preserved. the code i used is
package main
import (
"bufio"
"fmt"
"log"
"os"
"sync"
"github.com/google/uuid"
)
func main() {
file, err := os.Open("sampletest.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
lines := make(chan string)
// start four workers to do the heavy lifting
wc1 := startWorker(lines)
wc2 := startWorker(lines)
wc3 := startWorker(lines)
wc4 := startWorker(lines)
scanner := bufio.NewScanner(file)
go func() {
defer close(lines)
for scanner.Scan() {
lines <- scanner.Text()
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}()
writefiles(wc1, wc2, wc3, wc4)
}
func writefile(data string) {
file, err := os.Create("chunks/" + uuid.New().String() + ".txt")
if err != nil {
fmt.Println(err)
}
defer file.Close()
file.WriteString(data)
}
func startWorker(lines <-chan string) <-chan string {
finished := make(chan string)
go func() {
defer close(finished)
for line := range lines {
finished <- line
}
}()
return finished
}
func writefiles(cs ...<-chan string) {
var wg sync.WaitGroup
output := func(c <-chan string) {
var d string
for n := range c {
d += n
d += "\n"
}
writefile(d)
wg.Done()
}
wg.Add(len(cs))
for _, c := range cs {
go output(c)
}
go func() {
wg.Wait()
}()
}
Here using this code my file got split into 4 equal files, but the order in it is not preserved.
I am very new to golang, any suggestions are highly appreciated.
I took this code from some site and tweaked here and there to meet my requirements.

I took this code from some site and tweaked here and there to meet my requirements.
Based on your statement, you should be able to modify the code from running concurrently to sequentially, it's faaar easier than applying concurrent aspect to existing code.
The work is basically just: remove the concurrent part.
Anyway, below is a simple example of how to achieve what you want. I use your code as the base, and then I remove everything related to concurrent process.
package main
import (
"bufio"
"fmt"
"log"
"os"
"strings"
"github.com/google/uuid"
)
func main() {
split := 4
file, err := os.Open("file.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
texts := make([]string, 0)
for scanner.Scan() {
text := scanner.Text()
texts = append(texts, text)
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
lengthPerSplit := len(texts) / split
for i := 0; i < split; i++ {
if i+1 == split {
chunkTexts := texts[i*lengthPerSplit:]
writefile(strings.Join(chunkTexts, "\n"))
} else {
chunkTexts := texts[i*lengthPerSplit : (i+1)*lengthPerSplit]
writefile(strings.Join(chunkTexts, "\n"))
}
}
}
func writefile(data string) {
file, err := os.Create("chunks-" + uuid.New().String() + ".txt")
if err != nil {
fmt.Println(err)
}
defer file.Close()
file.WriteString(data)
}

Here is a simple file splitter. You can handle the leftovers yourself, I added the leftover bytes to 5th file.
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
file, err := os.Open("sample-text-file.txt")
if err != nil {
panic(err)
}
defer file.Close()
// to divide file in four chunks
info, _ := file.Stat()
chunkSize := int(info.Size() / 4)
// reader of chunk size
bufR := bufio.NewReaderSize(file, chunkSize)
// Notice the range over slice of len 5, after 4 leftover will be written to 5th file
for i := range [5]int{} {
reader := make([]byte, chunkSize)
rlen, err := bufR.Read(reader)
fmt.Println("Read: ", rlen)
if err != nil {
panic(err)
}
writeFile(i, rlen, &reader)
}
}
// Notice bufW as a pointer to avoid exchange of big byte slices
func writeFile(i int, rlen int, bufW *[]byte) {
fname := fmt.Sprintf("file_%v", i)
f, err := os.Create(fname)
defer f.Close()
w := bufio.NewWriterSize(f, rlen)
wbytes := *(bufW)
wLen, err := w.Write(wbytes[:rlen])
if err != nil {
panic(err)
}
fmt.Println("Wrote ", wLen, "to", fname)
w.Flush()
}

Loading matrix in from csv file - golang

I'm writing a program that performs math on matrixes. I want to load them in from a csv file and have the following code:
file, err := os.Open("matrix1.csv")
if err != nil {
log.Fatal(err)
}
defer file.Close()
lines, _ := csv.NewReader(file).ReadAll()
for i, line := range lines {
for j, val := range line {
valInt, err := strconv.Atoi(val)
if err != nil {
log.Fatal(err)
}
matrix1[i][j] = valInt
}
}
However the strconv code is throwing an error:
strconv.ParseInt: parsing "": invalid syntax
It appears that everything else in the code is correct, does anyone have any ideas on how to solve this error?
EDIT: I'm now trying to work on outputting my result to a new csv file.
I have the following code:
file2, err := os.Create("result.csv")
if err != nil {
log.Fatal(err)
}
defer file1.Close()
writer := csv.NewWriter(file2)
for line2 := range blank {
writer.Write(line2)
}
}
}
This gives the following error:
cannot use line2 (type int) as type []string in argument to writer.Write
Updated with the suggestions from comments however the above error is now seen.

This means one of the cells of your CSV is blank, I reproduced the error with this code:
package main
import (
"encoding/csv"
"log"
"strconv"
"strings"
)
func main() {
matrix1 := [5][5]int{}
file := strings.NewReader("1,2,3,4,5\n6,7,8,,0")
lines, _ := csv.NewReader(file).ReadAll()
for i, line := range lines {
for j, val := range line {
valInt, err := strconv.Atoi(val)
if err != nil {
log.Fatal(err)
}
matrix1[i][j] = valInt
}
}
}
If you are ok with treating blank cells as 0 this will get you past the error:
func main() {
matrix1 := [5][5]int{}
file := strings.NewReader("1,2,3,4,5\n6,7,8,,0")
lines, _ := csv.NewReader(file).ReadAll()
for i, line := range lines {
for j, val := range line {
var valInt int
var err error
if val == "" {
valInt = 0
} else {
valInt, err = strconv.Atoi(val)
}
if err != nil {
log.Fatal(err)
}
matrix1[i][j] = valInt
}
}
}

As mentions in the comments above the error was due to a missing value in my csv file.
Once the file was amended the error is now gone.

Initializing single element of string array into another string variable in Go lang

In this code, I read a text file for input (A1,B2) and Im using the split function to separate them at comma and store in strs, according to the function definition its returns a array in this case it is strs array, i want the first element in strs to be in currentSource and the second element to be in CurrentDest.I tried printing both the variables individually to check if its working,but the program exits after that and I get a error saying Panic: index out of range.
Can anybody help me out..!!!
var currentSource string
var currentDest string
func main() {
file, err := os.Open("chessin.txt")
if err != nil {
fmt.Println(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
strs := strings.Split(scanner.Text(), ",")
currentSource = strs[0]
currentDest = strs[1]
}

This works
var currentSource string
var currentDest string
func main() {
file := "A1,B2\n"
scanner := bufio.NewScanner(strings.NewReader(file))
for scanner.Scan() {
strs := strings.Split(scanner.Text(), ",")
currentSource = strs[0]
currentDest = strs[1]
fmt.Println(strs)
}
}
Are you sure your file (chessin.txt) it is ok?
Playground

This code is almost the same as your code and it works correctly:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
var currentSource string
var currentDest string
func main() {
// content of this file is (no spaces between comas, not \r or
// \n or any other whitespace):
//C3,F3,C4,A4,C5,A1
file, err := os.Open("chessin.txt")
if err != nil {
fmt.Println(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
strs := strings.Split(scanner.Text(), ",")
if len(strs) < 2 {
panic(`not enough elements in the file, to proceed`)
}
currentSource = strs[0]
currentDest = strs[1]
break //(B)
}
if currentSource != "C3" {
panic(`currentSource IS NOT C3`)
}
if currentDest != "F3" {
panic(`currentDest is not F3`)
}
// if we are here, then we are good
fmt.Println(currentSource, currentDest)
}
Just pay attention to the break statement at (B). That causes the for loop to stop, after the first and second elements of the first line - not sure; but that might be what you want.
So, if the program does not reach the (A) point, then there something wrong with chessin.txt.

"tail -f"-like generator

I had this convenient function in Python:
def follow(path):
with open(self.path) as lines:
lines.seek(0, 2) # seek to EOF
while True:
line = lines.readline()
if not line:
time.sleep(0.1)
continue
yield line
It does something similar to UNIX tail -f: you get last lines of a file as they come. It's convenient because you can get the generator without blocking and pass it to another function.
Then I had to do the same thing in Go. I'm new to this language, so I'm not sure whether what I did is idiomatic/correct enough for Go.
Here is the code:
func Follow(fileName string) chan string {
out_chan := make(chan string)
file, err := os.Open(fileName)
if err != nil {
log.Fatal(err)
}
file.Seek(0, os.SEEK_END)
bf := bufio.NewReader(file)
go func() {
for {
line, _, _ := bf.ReadLine()
if len(line) == 0 {
time.Sleep(10 * time.Millisecond)
} else {
out_chan <- string(line)
}
}
defer file.Close()
close(out_chan)
}()
return out_chan
}
Is there any cleaner way to do this in Go? I have a feeling that using an asynchronous call for such a thing is an overkill, and it really bothers me.

Create a wrapper around a reader that sleeps on EOF:
type tailReader struct {
io.ReadCloser
}
func (t tailReader) Read(b []byte) (int, error) {
for {
n, err := t.ReadCloser.Read(b)
if n > 0 {
return n, nil
} else if err != io.EOF {
return n, err
}
time.Sleep(10 * time.Millisecond)
}
}
func newTailReader(fileName string) (tailReader, error) {
f, err := os.Open(fileName)
if err != nil {
return tailReader{}, err
}
if _, err := f.Seek(0, 2); err != nil {
return tailReader{}, err
}
return tailReader{f}, nil
}
This reader can be used anywhere an io.Reader can be used. Here's how loop over lines using bufio.Scanner:
t, err := newTailReader("somefile")
if err != nil {
log.Fatal(err)
}
defer t.Close()
scanner := bufio.NewScanner(t)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Fprintln(os.Stderr, "reading:", err)
}
The reader can also be used to loop over JSON values appended to the file:
t, err := newTailReader("somefile")
if err != nil {
log.Fatal(err)
}
defer t.Close()
dec := json.NewDecoder(t)
for {
var v SomeType
if err := dec.Decode(&v); err != nil {
log.Fatal(err)
}
fmt.Println("the value is ", v)
}
There are a couple of advantages this approach has over the goroutine approach outlined in the question. The first is that shutdown is easy. Just close the file. There's no need to signal the goroutine that it should exit. The second advantage is that many packages work with io.Reader.
The sleep time can be adjusted up or down to meet specific needs. Decrease the time for lower latency and increase the time to reduce CPU use. A sleep of 100ms is probably fast enough for data that's displayed to humans.

Check out this Go package for reading from continuously updated files (tail -f): https://github.com/hpcloud/tail
t, err := tail.TailFile("filename", tail.Config{Follow: true})
for line := range t.Lines {
fmt.Println(line.Text)
}

Simple way to copy a file

Is there any simple/fast way to copy a file in Go?
I couldn't find a fast way in the Doc's and searching the internet doesn't help as well.

Warning: This answer is mainly about adding a hard link to a file, not about copying the contents.
A robust and efficient copy is conceptually simple, but not simple to implement due to the need to handle a number of edge cases and system limitations that are imposed by the target operating system and it's configuration.
If you simply want to make a duplicate of the existing file you can use os.Link(srcName, dstName). This avoids having to move bytes around in the application and saves disk space. For large files, this is a significant time and space saving.
But various operating systems have different restrictions on how hard links work. Depending on your application and your target system configuration, Link() calls may not work in all cases.
If you want a single generic, robust and efficient copy function, update Copy() to:
Perform checks to ensure that at least some form of copy will succeed (access permissions, directories exist, etc.)
Check to see if both files already exist and are the same using
os.SameFile, return success if they are the same
Attempt a Link, return if success
Copy the bytes (all efficient means failed), return result
An optimization would be to copy the bytes in a go routine so the caller doesn't block on the byte copy. Doing so imposes additional complexity on the caller to handle the success/error case properly.
If I wanted both, I would have two different copy functions: CopyFile(src, dst string) (error) for a blocking copy and CopyFileAsync(src, dst string) (chan c, error) which passes a signaling channel back to the caller for the asynchronous case.
package main
import (
"fmt"
"io"
"os"
)
// CopyFile copies a file from src to dst. If src and dst files exist, and are
// the same, then return success. Otherise, attempt to create a hard link
// between the two files. If that fail, copy the file contents from src to dst.
func CopyFile(src, dst string) (err error) {
sfi, err := os.Stat(src)
if err != nil {
return
}
if !sfi.Mode().IsRegular() {
// cannot copy non-regular files (e.g., directories,
// symlinks, devices, etc.)
return fmt.Errorf("CopyFile: non-regular source file %s (%q)", sfi.Name(), sfi.Mode().String())
}
dfi, err := os.Stat(dst)
if err != nil {
if !os.IsNotExist(err) {
return
}
} else {
if !(dfi.Mode().IsRegular()) {
return fmt.Errorf("CopyFile: non-regular destination file %s (%q)", dfi.Name(), dfi.Mode().String())
}
if os.SameFile(sfi, dfi) {
return
}
}
if err = os.Link(src, dst); err == nil {
return
}
err = copyFileContents(src, dst)
return
}
// copyFileContents copies the contents of the file named src to the file named
// by dst. The file will be created if it does not already exist. If the
// destination file exists, all it's contents will be replaced by the contents
// of the source file.
func copyFileContents(src, dst string) (err error) {
in, err := os.Open(src)
if err != nil {
return
}
defer in.Close()
out, err := os.Create(dst)
if err != nil {
return
}
defer func() {
cerr := out.Close()
if err == nil {
err = cerr
}
}()
if _, err = io.Copy(out, in); err != nil {
return
}
err = out.Sync()
return
}
func main() {
fmt.Printf("Copying %s to %s\n", os.Args[1], os.Args[2])
err := CopyFile(os.Args[1], os.Args[2])
if err != nil {
fmt.Printf("CopyFile failed %q\n", err)
} else {
fmt.Printf("CopyFile succeeded\n")
}
}

import (
"io/ioutil"
"log"
)
func checkErr(err error) {
if err != nil {
log.Fatal(err)
}
}
func copy(src string, dst string) {
// Read all content of src to data, may cause OOM for a large file.
data, err := ioutil.ReadFile(src)
checkErr(err)
// Write data to dst
err = ioutil.WriteFile(dst, data, 0644)
checkErr(err)
}

If you are running the code in linux/mac, you could just execute the system's cp command.
srcFolder := "copy/from/path"
destFolder := "copy/to/path"
cpCmd := exec.Command("cp", "-rf", srcFolder, destFolder)
err := cpCmd.Run()
It's treating go a bit like a script, but it gets the job done. Also, you need to import "os/exec"

Starting with Go 1.15 (Aug 2020), you can use File.ReadFrom:
package main
import "os"
func main() {
r, err := os.Open("in.txt")
if err != nil {
panic(err)
}
defer r.Close()
w, err := os.Create("out.txt")
if err != nil {
panic(err)
}
defer w.Close()
w.ReadFrom(r)
}

Perform the copy in a stream, using io.Copy.
Close all opened file descriptors.
All errors that should be checked are checked, including the errors in deferred (*os.File).Close calls.
Gracefully handle multiple non-nil errors, e.g. non-nil errors from both io.Copy and (*os.File).Close.
No unnecessary complications that were present in other answers, such as calling Close twice on the same file but ignoring the error on one of the calls.
No unnecessary stat checks for existence or for file type. These checks aren't necessary: the future open and read operations will return an error anyway if it's not a valid operation for the type of file. Secondly, such checks are prone to races (e.g. the file might be removed in the time between stat and open).
Accurate doc comment. See: "file", "regular file", and behavior when dstpath exists. The doc comment also matches the style of other functions in package os.
// Copy copies the contents of the file at srcpath to a regular file at dstpath.
// If dstpath already exists and is not a directory, the function truncates it.
// The function does not copy file modes or file attributes.
func Copy(srcpath, dstpath string) (err error) {
r, err := os.Open(srcpath)
if err != nil {
return err
}
defer r.Close() // ok to ignore error: file was opened read-only.
w, err := os.Create(dstpath)
if err != nil {
return err
}
defer func() {
e := w.Close()
// Report the error from Close, if any.
// But do so only if there isn't already
// an outgoing error.
if e != nil && err == nil {
err = e
}
}()
_, err = io.Copy(w, r)
return err
}

In this case there are a couple of conditions to verify, I prefer non-nested code
func Copy(src, dst string) (int64, error) {
src_file, err := os.Open(src)
if err != nil {
return 0, err
}
defer src_file.Close()
src_file_stat, err := src_file.Stat()
if err != nil {
return 0, err
}
if !src_file_stat.Mode().IsRegular() {
return 0, fmt.Errorf("%s is not a regular file", src)
}
dst_file, err := os.Create(dst)
if err != nil {
return 0, err
}
defer dst_file.Close()
return io.Copy(dst_file, src_file)
}

If you are on windows, you can wrap CopyFileW like this:
package utils
import (
"syscall"
"unsafe"
)
var (
modkernel32 = syscall.NewLazyDLL("kernel32.dll")
procCopyFileW = modkernel32.NewProc("CopyFileW")
)
// CopyFile wraps windows function CopyFileW
func CopyFile(src, dst string, failIfExists bool) error {
lpExistingFileName, err := syscall.UTF16PtrFromString(src)
if err != nil {
return err
}
lpNewFileName, err := syscall.UTF16PtrFromString(dst)
if err != nil {
return err
}
var bFailIfExists uint32
if failIfExists {
bFailIfExists = 1
} else {
bFailIfExists = 0
}
r1, _, err := syscall.Syscall(
procCopyFileW.Addr(),
3,
uintptr(unsafe.Pointer(lpExistingFileName)),
uintptr(unsafe.Pointer(lpNewFileName)),
uintptr(bFailIfExists))
if r1 == 0 {
return err
}
return nil
}
Code is inspired by wrappers in C:\Go\src\syscall\zsyscall_windows.go

Here is an obvious way to copy a file:
package main
import (
"os"
"log"
"io"
)
func main() {
sFile, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer sFile.Close()
eFile, err := os.Create("test_copy.txt")
if err != nil {
log.Fatal(err)
}
defer eFile.Close()
_, err = io.Copy(eFile, sFile) // first var shows number of bytes
if err != nil {
log.Fatal(err)
}
err = eFile.Sync()
if err != nil {
log.Fatal(err)
}
}

You can use "exec".
exec.Command("cmd","/c","copy","fileToBeCopied destinationDirectory") for windows
I have used this and its working fine. You can refer manual for more details on exec.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Go: Reading a specific range of lines in a file - file

If you're using for to iterate through a slice of lines, you could use something along the lines of for _,line := range file[2:40] { // do stuff }

Related

How to chunk a file into 4 equal files

Loading matrix in from csv file - golang

Initializing single element of string array into another string variable in Go lang

"tail -f"-like generator

Simple way to copy a file

Categories

Resources