How to split a File Source into Strings or Words - file

I have a file with content like this:
"Some","Words","separated","by","comma","and","quoted","with","double","quotes"
The File is to large to read it into just on String.
What is the simplest way to split it into a Traversable of Strings, with each element being a word?
If it matters: While the content of the file won't fit in a single String the resulting Traversable might be a List without a problem.

Here is an adaptation of your own solution, using JavaConversions to manipulate the Java iterator as a Scala one.
import java.util.Scanner
import java.io.File
import scala.collection.JavaConversions._
val scanner = new Scanner(new File("...")).useDelimiter(",")
scanner.map(_.trim).map(quoted => quoted.substring(1, quoted.length - 1))
This gives you an iterator. You can always convert it to a list using e.g. .toList.

Here is a version using stringLit and repsep from Scala parser combinators. I won't vouch for its efficiency, though.
import scala.util.parsing.combinator.syntactical.StdTokenParsers
import scala.util.parsing.combinator.lexical.StdLexical
import scala.util.parsing.input.StreamReader
import java.io.FileReader
object P extends StdTokenParsers {
type Tokens = StdLexical
val lexical = new StdLexical
lexical.delimiters += ","
def words : Parser[List[String]] = repsep(stringLit, ",")
def getWords(fileName : String) : List[String] = {
val scanner = new lexical.Scanner(StreamReader(new FileReader(fileName)))
// better error handling wouldn't hurt.
words(scanner).get
}
}

I did it using the java.util.Scanner while it does work, I'd appreciate a more scalaesc version.
val scanner = new Scanner(new File("""bigFile.txt""")).useDelimiter(",")
var wordList: Vector[String] = Vector()
while (scanner.hasNext()) {
val quoted = scanner.next()
val word = quoted.replace("\"", "")
wordList = wordList :+ word
}

Related

Appending integers to an array in scala, but printing the array(even while using .mkSting("") doesn't show anything

I'm reading a text file and have the syntax set up correctly to do so. What I want to do now is append all the integers into an array, but when I try to use a print statement to check what's going on, nothing shows up in the terminal.
package lecture
import scala.io.{BufferedSource, Source}
object LectureQuestion {
def fileSum(fileName: String): Int = {
var arrayOfnumbers = Array[String]()
var fileOfnumbers: BufferedSource = Source.fromFile(fileName)
for (line <- fileOfnumbers.getLines()){
val splits: Array[String] =line.split("#")
for (number <- splits){
arrayOfnumbers :+ number
println(arrayOfnumbers.mkString(""))
}
//println(splits.mkString(" "))
}
3
}
def main(args: Array[String]): Unit = {
println(fileSum("data/fileOfnumbers.txt"))
}
}
I set up a blank array to append the numbers to. I tried switching var to val, but that wouldn't make sense as var is mutuable, meaning it can change. I'm pretty sure the way to add things to an array in Scala is :+, so I'm not sure what's going on.
In Scala all you would need is flatMap the List of a List and then sum the result.
Here your example simplified, as we have extracted the lines already:
import scala.util.Try
def listSum(lines: List[String]): Int = {
(for{
line <- lines
number <- line.split("#").map(n => Try(n.trim.toInt).getOrElse(0))
} yield number).sum
}
listSum(List("12#43#134#bad","13#54#47")) // -> 303
No vars, resp. no mutability needed. Just a nice for-comprehension;).
And for comparison the solution with flatMap:
def listSum(lines: List[String]): Int = {
lines
.flatMap(_.split("#").map(n => Try(n.trim.toInt).getOrElse(0)))
.sum
}

How to read plain text file in kotlin?

There may be various way to read plain text file in kotlin.
I want know what are the possible ways and how I can use them.
1. Using BufferedReader
import java.io.File
import java.io.BufferedReader
fun main(args: Array<String>) {
val bufferedReader: BufferedReader = File("example.txt").bufferedReader()
val inputString = bufferedReader.use { it.readText() }
println(inputString)
}
2. Using InputStream
Read By Line
import java.io.File
import java.io.InputStream
fun main(args: Array<String>) {
val inputStream: InputStream = File("example.txt").inputStream()
val lineList = mutableListOf<String>()
inputStream.bufferedReader().forEachLine { lineList.add(it) }
lineList.forEach{println("> " + it)}
}
Read All Lines
import java.io.File
import java.io.InputStream
fun main(args: Array<String>) {
val inputStream: InputStream = File("example.txt").inputStream()
val inputString = inputStream.bufferedReader().use { it.readText() }
println(inputString)
}
3. Use File directly
import java.io.File
import java.io.BufferedReader
fun main(args: Array<String>) {
val lineList = mutableListOf<String>()
File("example.txt").useLines { lines -> lines.forEach { lineList.add(it) }}
lineList.forEach { println("> " + it) }
}
I think the simplest way to code is using kotlin.text and java.io.File
import java.io.File
fun main(args: Array<String>) {
val text = File("sample.txt").readText()
println(text)
}
The answers above here are all based on Kotlin Java. Here is a Kotlin Native way to read text files:
val bufferLength = 64 * 1024
val buffer = allocArray<ByteVar>(bufferLength)
for (i in 1..count) {
val nextLine = fgets(buffer, bufferLength, file)?.toKString()
if (nextLine == null || nextLine.isEmpty()) break
val records = parseLine(nextLine, ',')
val key = records[column]
val current = keyValue[key] ?: 0
keyValue[key] = current + 1
}
fun parseLine(line: String, separator: Char) : List<String> {
val result = mutableListOf<String>()
val builder = StringBuilder()
var quotes = 0
for (ch in line) {
when {
ch == '\"' -> {
quotes++
builder.append(ch)
}
(ch == '\n') || (ch == '\r') -> {}
(ch == separator) && (quotes % 2 == 0) -> {
result.add(builder.toString())
builder.setLength(0)
}
else -> builder.append(ch)
}
}
return result
}
See: https://github.com/JetBrains/kotlin-native/blob/master/samples/csvparser/src/csvParserMain/kotlin/CsvParser.kt
Anisuzzaman's answer lists several possibilities.
The main differences between them are in whether the file is read into memory as a single String, read into memory and split into lines, or read line-by-line.
Obviously, reading the entire file into memory in one go can take a lot more memory, so that's something to avoid unless it's really necessary.  (Text files can get arbitrarily big!)  So processing line-by-line with BufferedReader.useLines() is often a good approach.
The remaining differences are mostly historical.  Very early versions of Java used InputStream &c which didn't properly distinguish between characters and bytes; Reader &c were added to correct that.  Java 8 added ways to read line-by-line more efficiently using streams (e.g. Files.lines()).  And more recently, Kotlin has added its own extension functions (e.g. BufferedReader.useLines()) which make it even simpler.
To read a text file, it must first be created. In Android Studio, you would create the text file like this:
1) Select "Project" from the top of the vertical toolbar to open the project "tool window"
2) From the drop-down menu at the top of the "tool window", select "Android"
3) Right-click on "App" and select "New"
then -> "Folder" (the one with the green Android icon beside it)
then -> "Assets Folder"
4) Right-click on the "assets" folder after it appears in the "tool window"
5) Select "New" -> "File"
6) Name the file, and included the extension ".txt" if it is text file, or ".html" if it is for WebView
7) Edit the file or cut and paste text into it. The file will now display under the "Project" files in the "tool window" and you will be able to double-click it to edit it at any time.
TO ACCESS THIS FILE, use a prefix of "application.assets." followed by someFunction(fileName). For example (in Kotlin):
val fileName = "townNames.txt"
val inputString = application.assets.open(fileName).bufferedReader().use { it.readText() }
val townList: List<String> = inputString.split("\n")
how to apply Documents path on that:
fun main(args: Array<String>) {
val inputStream: InputStream = File("example.txt").inputStream()
val inputString = inputStream.bufferedReader().use { it.readText() }
println(inputString)
}

Test if string contains anything from an array of strings (kotlin)

I'm new to Kotlin (I have a Java background) and I can't seem to figure out how to check whether a string contains a match from a list of keywords.
What I want to do is check if a string contains a match from an array of keywords (case-insensitive please). If so, print out the keyword(s) that was matched and the string that contained the keyword. (I will be looping over a bunch of strings in a file).
Here's an MVE for starters:
val keywords = arrayOf("foo", "bar", "spam")
fun search(content: String) {
var match = <return an array of the keywords that content contained>
if(match.size > 0) {
println("Found match(es): " + match + "\n" + content)
}
}
fun main(args: Array<String>) {
var str = "I found food in the barn"
search(str) //should print out that foo and bar were a match
}
As a start (this ignores the 'match' variable and getting-a-list-of-keywords-matched), I tried using the following if statement according with what I found at this question,
if(Arrays.stream(keywords).parallel().anyMatch(content::contains))
but it put a squiggly line under "content" and gave me this error
None of the following functions can be called with the arguments
supplied: public operator fun CharSequence.contains(char: Char,
ignoreCase: Boolean = ...): Boolean defined in kotlin.text public
operator fun CharSequence.contains(other: CharSequence, ignoreCase:
Boolean = ...): Boolean defined in kotlin.text #InlineOnly public
inline operator fun CharSequence.contains(regex: Regex): Boolean
defined in kotlin.text
You can use the filter function to leave only those keywords contained in content:
val match = keywords.filter { it in content }
Here match is a List<String>. If you want to get an array in the result, you can add .toTypedArray() call.
in operator in the expression it in content is the same as content.contains(it).
If you want to have case insensitive match, you need to specify ignoreCase parameter when calling contains:
val match = keywords.filter { content.contains(it, ignoreCase = true) }
Another obvious choice is using a regex doing case-insensitive matching:
arrayOf("foo", "bar", "spam").joinToString(prefix = "(?i)", separator = "|").toRegex())
Glues together a pattern with a prefixed inline (?i) incase-sensitive modifier, and alternations between the keywords: (?i)foo|bar|spam
Sample Code:
private val keywords = arrayOf("foo", "bar", "spam")
private val pattern = keywords.joinToString(prefix = "(?i)", separator = "|")
private val rx = pattern.toRegex()
fun findKeyword(content: String): ArrayList<String> {
var result = ArrayList<String>()
rx.findAll(content).forEach { result.add(it.value) }
return result
}
fun main(args: Array<String>) {
println(findKeyword("Some spam and a lot of bar"));
}
The regex approach could be handy if you are after some more complex matching, e.g. non-/overlapping matches adding word boundaries \b, etc.
Here is my approach without Streams:
fun String.containsAnyOfIgnoreCase(keywords: List<String>): Boolean {
for (keyword in keywords) {
if (this.contains(keyword, true)) return true
}
return false
}
Usage:
"test string".containsAnyOfIgnoreCase(listOf("abc","test"))
I think Any is the efficient way.
fun findMatch(s: String, strings: List<String>): Boolean {
return strings.any { s.contains(it) }
}
fun main() {
val today = "Wednesday"
val weekend = listOf("Sat", "Sun")
println(if (findMatch(today, weekend)) "Yes" else "No") // No
}
reference: click here

How to create associative array of strings ( string[strings] ) from key-value file?

Here example of associative array that I would like to get:
string [string] rlist = ["dima":"first", "masha":"second", "roma":"third"];
Text file that I read have very simple structure:
peter = fourth
ivan = fifth
david = sixth
string [string] strarr;
string txt = readText("test.txt");
foreach (t;txt.splitLines())
{
// ??
}
Could anybody suggest way?
It may be me but I find it hard to reason about with a for loop and temp variables, I would rather do something like:
import std.conv;
import std.stdio;
import std.array;
import std.algorithm;
void main() {
string[string] dic = File("test")
.byLine
.map!(l => l.to!string.findSplit(" = "))
.map!(l => tuple( l[0], l[2] ))
.assocArray;
}
byLine: read line by line, better than reading the whole thing and then splitting.
first map: split each line into three parts as explained by rcorre
second map: build pairs from the splitted lines
assocArray: build an associative array from those pairs.
Try the following:
import std.string : splitLines, strip;
import std.file : readText;
import std.algorithm : findSplit;
string[string] strarr;
string txt = readText("test.txt");
foreach(t ; txt.splitLines()) {
auto res = t.findSplit("=");
string key = res[0].strip;
string val = res[2].strip;
strarr[key] = val;
}
findSplit will return three ranges: the part before '=', '=', and the part after '='. strip can be used to remove whitespace around the = that would otherwise be included in the key and value.
If you want a more robust solution, you could consider a D library for reading config/ini files like onyx-config, ctini, or dini.

Not writing to the file Scala

I have the following code, which is supposed to write to a file one line at a time, until it reaches ^EOF.
import java.io.PrintWriter
import java.io.File
object WF {
def writeline(file: String)(delim: String): Unit = {
val writer = new PrintWriter(new File(file))
val line = Console.readLine(delim)
if (line != "^EOF") {
writer.write(line + "\n")
writer.flush()
}
else {
sys.exit()
}
}
}
var counter = 0
val filename = Console.readLine("Enter a file: ")
while (true) {
counter += 1
WF.writeline(filename)(counter.toString + ": ")
}
For some reason, at the console, everything looks like it works fine, but then, when I actually read the file, nothing has been written to it! What is wrong with my program?
Every time you create a new PrintWriter you're wiping out the existing file. Use something like a FileWriter, which allows you to specify that you want to open the file for appending:
val writer = new PrintWriter(new FileWriter(new File(file), true))
That should work, although the logic here is pretty confusing.
Use val writer = new FileWriter(new File(file), true) instead. The second parameter tells the FileWriter to append to the file. See http://docs.oracle.com/javase/6/docs/api/java/io/FileWriter.html
I'm guessing the problem is that you forgot to close the writer.

Resources