Removing whitespace and then empty lines from an array of strings - arrays

I've read a text file into a big string:
fileText = try NSString(contentsOfFile: pathToFile, encoding: String.Encoding.utf8.rawValue) as String
(I omitted the do/catch part and fileText was declared as an optional string constant before the assignment).
Now I split the lines out into an array of strings, trim the whitespace from each, and then remove any empty strings:
let lines = (fileText!.components(separatedBy: "\n")).map { $0.trimmingCharacters(in: .whitespaces)}.filter {$0.count > 0}
And it works fine, but I'm learning Swift 4 and I suspect there's a cleaner way to accomplish my task, right? I would appreciate any examples that put my code to shame. Thanks!

Problem of your code, is that you iterate through all String charecters for four times. But task can be complete only iterating once. Something like this:
let myString = String() // String received from any source
var lines = [String]()
var line = ""
myString!.forEach {
switch $0 {
case " ":
break
case "\n":
if line.count > 0 {
lines.append(line)
line = ""
}
default:
line += String($0)
}
}
print(lines)
For test file:
$ cat test.txt
123123123 12312312
123 12312312 12312312
sfsdfsdfsfsdf
sdfsdf 23234 sdfsdfs 23234
sdfsdf
sdfsdfsdf
Result will be:
$ swift main.swift
["12312312312312312", "1231231231212312312", "sfsdfsdfsfsdf", "sdfsdf23234sdfsdfs23234", "sdfsdf", "sdfsdfsdf"]

Related

How to convert a very long string to array of strings in a most effective way?

I have text file consisting repeating strings elements like this
#SRR1582908.1 ILLUMINA_0154:8:1101:3556:1998/1 //1 line
CCTCGCTGGTCGATTTGTTTAACCGTTTTCTGTTCAGCGCCAAAATTATTTT //2 line
+ //3 line
BCCFFFFFHHHHHIIIIIHIHGHHGGIIIIIGHHDBGHHHIIGHIIGIHIII //4 line
and 29 millions of such elements in the file. What I need is to extract second line from each element into new array of strings
I have a function which is doing following:
do
{
let textToOpen = try String(contentsOf: chosenFile!, encoding: .utf8)
var arrayOfTextLines = textToOpen.components(separatedBy: "\n")
arrayOfReads = stride(from: 1, to: arrayOfTextLines.endIndex, by: 4).map{(arrayOfTextLines[$0])}
}
catch
{}
The code is working correctly but for the file with 29 millions elements is relatively slow.
The main drawback is
var arrayOfTextLines = textToOpen.components(separatedBy: "\n")
Is there a way to speed-up the function ?

How to split a String into an array? String.split(",") results in unexpected data type/struct [duplicate]

From the documentation, it's not clear. In Java you could use the split method like so:
"some string 123 ffd".split("123");
Use split()
let mut split = "some string 123 ffd".split("123");
This gives an iterator, which you can loop over, or collect() into a vector.
for s in split {
println!("{}", s)
}
let vec = split.collect::<Vec<&str>>();
// OR
let vec: Vec<&str> = split.collect();
There are three simple ways:
By separator:
s.split("separator") | s.split('/') | s.split(char::is_numeric)
By whitespace:
s.split_whitespace()
By newlines:
s.lines()
By regex: (using regex crate)
Regex::new(r"\s").unwrap().split("one two three")
The result of each kind is an iterator:
let text = "foo\r\nbar\n\nbaz\n";
let mut lines = text.lines();
assert_eq!(Some("foo"), lines.next());
assert_eq!(Some("bar"), lines.next());
assert_eq!(Some(""), lines.next());
assert_eq!(Some("baz"), lines.next());
assert_eq!(None, lines.next());
There is a special method split for struct String:
fn split<'a, P>(&'a self, pat: P) -> Split<'a, P> where P: Pattern<'a>
Split by char:
let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
assert_eq!(v, ["Mary", "had", "a", "little", "lamb"]);
Split by string:
let v: Vec<&str> = "lion::tiger::leopard".split("::").collect();
assert_eq!(v, ["lion", "tiger", "leopard"]);
Split by closure:
let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.is_numeric()).collect();
assert_eq!(v, ["abc", "def", "ghi"]);
split returns an Iterator, which you can convert into a Vec using collect: split_line.collect::<Vec<_>>(). Going through an iterator instead of returning a Vec directly has several advantages:
split is lazy. This means that it won't really split the line until you need it. That way it won't waste time splitting the whole string if you only need the first few values: split_line.take(2).collect::<Vec<_>>(), or even if you need only the first value that can be converted to an integer: split_line.filter_map(|x| x.parse::<i32>().ok()).next(). This last example won't waste time attempting to process the "23.0" but will stop processing immediately once it finds the "1".
split makes no assumption on the way you want to store the result. You can use a Vec, but you can also use anything that implements FromIterator<&str>, for example a LinkedList or a VecDeque, or any custom type that implements FromIterator<&str>.
There's also split_whitespace()
fn main() {
let words: Vec<&str> = " foo bar\t\nbaz ".split_whitespace().collect();
println!("{:?}", words);
// ["foo", "bar", "baz"]
}
The OP's question was how to split with a multi-character string and here is a way to get the results of part1 and part2 as Strings instead in a vector.
Here splitted with the non-ASCII character string "β˜„β˜ƒπŸ€”" in place of "123":
let s = "β˜„β˜ƒπŸ€”"; // also works with non-ASCII characters
let mut part1 = "some string β˜„β˜ƒπŸ€” ffd".to_string();
let _t;
let part2;
if let Some(idx) = part1.find(s) {
part2 = part1.split_off(idx + s.len());
_t = part1.split_off(idx);
}
else {
part2 = "".to_string();
}
gets: part1 = "some string "
Β  Β  Β  Β  Β part2 = " ffd"
If "β˜„β˜ƒπŸ€”" not is found part1 contains the untouched original String and part2 is empty.
Here is a nice example in Rosetta Code -
Split a character string based on change of character - of how you can turn a short solution using split_off:
fn main() {
let mut part1 = "gHHH5YY++///\\".to_string();
if let Some(mut last) = part1.chars().next() {
let mut pos = 0;
while let Some(c) = part1.chars().find(|&c| {if c != last {true} else {pos += c.len_utf8(); false}}) {
let part2 = part1.split_off(pos);
print!("{}, ", part1);
part1 = part2;
last = c;
pos = 0;
}
}
println!("{}", part1);
}
into that
Task
Split a (character) string into comma (plus a blank) delimited strings based on a change of character (left to right).
If you are looking for the Python-flavoured split where you tuple-unpack the two ends of the split string, you can do
if let Some((a, b)) = line.split_once(' ') {
// ...
}

how to delete an element that contains a letter in array using swift

I am having trouble trying to figure this topic out. Like the topic, How do I delete an element that contains a letter in Array. This is the code I have so far.
let newline = "\n"
let task = Process()
task.launchPath = "/bin/sh"
task.arguments = ["-c", "traceroute -nm 18 -q 1 8.8.8.8"]
let pipe = Pipe()
task.standardOutput = pipe
task.launch()
let data = pipe.fileHandleForReading.readDataToEndOfFile()
let output = NSString(data: data, encoding: String.Encoding.utf8.rawValue) as! String
var array = output.components(separatedBy: " ")
array = array.filter(){$0 != "m"}
print(array, newline)
I have tried multiple options given by this stack overflow.
How to remove an element from an array in Swift
I think I have hit a wall.
Have you tried
array = array.filter({ !$0.contains("m") })

In Ruby, if array string equals any of the second array string, then how can I delete from file or array?

I'm trying to read a file of key words (1 word per line), and I put each word into an array. I am trying to delete any of the words that match the 'useless words' array, so my code reads:
useless_words = ["a","the","at","we","are","be","i","what", "want","you","oh","u"]
word1 = []
puts file2.count
file2.each do |file|
word1 << file
end
word1.each do |word|
useless_words.each do |word2|
if word.to_s == word2.to_s true
word1.delete_if { |word|
word.to_s == word2.to_s
}
end
end
end
puts word1.count
My word1 count should be slightly less than file1 count, however my response is:
file2.count = 39
and word1.count = 0
Can someone tell me what I'm doing wrong?
Ok your code is pretty obscure for me, it will be nice if you put more comments next time and/or maybe have a better convention naming for yours variables.
I think your problem is due to the fact that the word you have in word1 contain the \n.
Here is my solution
#!/usr/bin/env ruby
useless_words = ["a","the","at","we","are","be","i","what", "want","you","oh","u"]
words_in_file = []
File.open("file.txt", "r") do |infile|
while (line = infile.gets)
words_in_file << line.strip
end
end
puts "Words in file: #{words_in_file}"
useful_words = words_in_file.reject { |w| useless_words.include?(w) }
puts "Useful words in file: #{useful_words}"
It give me this result for my file:
Words in file: ["abc", "def", "to", "a", "we", "are", "we"]
Useful words in file: ["abc", "def", "to"]
A improved solution is:
#!/usr/bin/env ruby
useless_words = ["a","the","at","we","are","be","i","what", "want","you","oh","u"]
useful_words = []
File.open("file.txt", "r") do |infile|
while (line = infile.gets)
strip_line = line.strip
useful_words << strip_line unless useless_words.include?(strip_line)
end
end
Thanks to #T.Aoukar for his comments.
Pretty much that is one messy code, let me try to create a cleaner code for you:
useless_words = ['a','the','at','we','are','be','i','what', 'want','you','oh','u']
words = []
File.open(path,'r'){|f| words = f.readlines}
words = words - useless_words
puts words.count
When using words - useless_words you'll delete all elements from first array (words from file) that exist in the second array (useless_words).
Hope this is the answer you're looking for.

Swift - Sorting url content into an array

So i have managed to download the content of this webpage - http://www.puzzlers.org/pub/wordlists/unixdict.txt
and i am trying to sort each word into an array like so -
let url = NSURL(string: "http://www.puzzlers.org/pub/wordlists/unixdict.txt")
let task = NSURLSession.sharedSession().dataTaskWithURL(url!) {
(data, respons, error) in
if error == nil {
var urlContent = NSString(data: data, encoding: NSUTF8StringEncoding) as String
var wordArr = urlContent.componentsSeparatedByString(" ")
println(wordArr)
}
}
task.resume()
}
But the problem is i am not getting each word into an array because the words are not separated by a "Space". All i'm getting is the content but as 1 item in the array.
How would i go about separating the content into an array of words. Separating the words by line breaks?
Thank you in advance :)
The preferred way is enumerateLinesUsingBlock with NSString
var urlContent = NSString(data: data, encoding: NSUTF8StringEncoding)!
var lines:[String] = []
urlContent.enumerateLinesUsingBlock { line, _ in
lines.append(line)
}
println(lines)
In this specific case, you can simply split by "\n":
var urlContent = NSString(data: data, encoding: NSUTF8StringEncoding) as String
var lines = urlContent.componentsSeparatedByString("\n")
if lines.last == "" {
// The last one is empty string
lines.removeLast()
}
But in general, the line separator may not be "\n". see this document:
A line is delimited by any of these characters, the longest possible
sequence being preferred to any shorter:
U+000D (\r or CR)
U+2028 (Unicode line separator)
U+000A (\n or LF)
U+2029 (Unicode paragraph separator)
\r\n, in that order (also known as CRLF)
enumerateLinesUsingBlock can handle all of them.

Resources