How to build a str in Rust [duplicate] - arrays

This question already has answers here:
How to handle "borrowed value does not live long enough" error when finding the longest substring of consecutive equal characters?
(2 answers)
Closed last month.
Suppose I have a char in the variable c and a positive int in the variable n. I want to build the str containing c occurring n times. How can I do it?
I tried building it as a String, and maybe I just got dizzy trying to read the documentation on strings, but I couldn't see how to convert it to a str. But then if I'm trying to just build it as a str directly then I couldn't see how to do that either.
For context, here is the full function I'm trying to implement. It takes a string and finds the longest sequence of consecutive characters (and breaks ties by taking the first that occurs).
pub fn longest_sequence(s: &str) -> Option<&str> {
if s.len() == 0 { return None; }
let mut current_c = s.as_bytes()[0] as char;
let mut greatest_c = s.as_bytes()[0] as char;
let mut current_num = 0;
let mut greatest_num = 0;
for ch in s.chars() {
if current_c == ch {
current_num += 1;
if current_num > greatest_num {
greatest_num = current_num;
greatest_c = current_c;
}
} else {
current_num = 1;
current_c = ch;
}
}
// Now build the output str ...
}

I think there are a couple of misconceptions about str vs String.
str can never exist alone. It is always used as &str (or Box<str> or *str, but in your case those shouldn't matter).
&str does not own any data. It is merely a reference to (parts of) another String.
String actually holds data.
So when you want to return data, use String; if you want to reference existing data, return &str.
There is no way to convert a local String to a &str. Somewhere the data has to be stored, and &str doesn't store it. (for completeness sake: Yes you could leak it, but that would create a permanent string in memory that will never go away again)
So in your case there are two ways:
Reference the input &str, because somewhere its data is already stored.
Return a String instead.
As a side note: do not do s.as_bytes()[0] as char, as it will not work with UTF8-strings. Rust strings are defined as UTF8.
Here is one possible solution:
pub fn longest_sequence(s: &str) -> Option<&str> {
let mut current_c = s.chars().next()?;
let mut current_start = 0;
let mut current_len = 0;
let mut greatest: &str = "";
let mut greatest_len = 0;
for (pos, ch) in s.char_indices() {
if current_c == ch {
current_len += 1;
} else {
if greatest_len < current_len {
greatest = &s[current_start..pos];
greatest_len = current_len;
}
current_len = 1;
current_c = ch;
current_start = pos;
}
}
if greatest_len < current_len {
greatest = &s[current_start..];
}
Some(greatest)
}
pub fn main() {
let s = "πŸ€ͺπŸ˜πŸ˜πŸ˜πŸ˜‰β‚¬β‚¬πŸ€ͺπŸ€ͺ";
let seq = longest_sequence(s);
println!("{:?}", seq);
}
Some("😁😁😁")
Some explanations:
No need to check for empty string. s.chars().next()? does so automatically.
Use s.chars().next() instead of s.as_bytes()[0] as char, as the second one is not UTF8 compatible.
I explicitely store greatest_len instead of using greatest.len() because greatest.len() is also not UTF8 compatible as it gives you the size of the string in bytes, not in chars.
You stored the new largest string whenever a new char of the same value was found; I had to move it to the case where the char type changed (and once after the loop), because we don't yet know the end of the current char. Again, note that &s[current_start..current_start+current_len] wouldn't work, because &s[ .. ] wants indices in bytes, but current_len is in chars. So we need to wait for another char to know where the previous one ended.
Another solution, based on your code, would be:
pub fn longest_sequence(s: &str) -> Option<String> {
let mut current_c = s.chars().next()?;
let mut greatest_c = current_c;
let mut current_num = 0;
let mut greatest_num = 0;
for ch in s.chars() {
if current_c == ch {
current_num += 1;
if current_num > greatest_num {
greatest_num = current_num;
greatest_c = current_c;
}
} else {
current_num = 1;
current_c = ch;
}
}
// Build the output String
Some(std::iter::repeat(greatest_c).take(greatest_num).collect())
}
pub fn main() {
let s = "πŸ€ͺπŸ˜πŸ˜πŸ˜πŸ˜‰β‚¬β‚¬πŸ€ͺπŸ€ͺ";
let seq = longest_sequence(s);
println!("{:?}", seq);
}
Some("😁😁😁")

To convert a String to &'static str you need to leak it like this:
fn leak(s: String) -> &'static str {
let ptr = s.as_str() as *const str;
core::mem::forget(s);
unsafe {&*ptr}
}
And char to String:
fn cts(c: char, n: usize) -> String {
(0..n)
.map(|_| c)
.collect()
}
So char to &'static str basically will look like this:
fn conv(c: char, n: usize) -> &'static str {
leak(cts(c, n))
}
I do not recommend to leak the String tho, just use it as is.

Related

How do I initialize flate2::GzEncoder outside of a loop?

This outputs the raw and GZip compressed length line by line as a way to estimate string complexity:
use std::fs::File;
use std::io::{BufReader, BufRead, Read};
use flate2::{read, Compression};
fn main() {
let mut f = File::open("/etc/passwd").unwrap();
let mut f = BufReader::new(f);
let mut _buf = vec![0u8; 100];
for line in f.lines() {
let l = line.unwrap();
let p = l.as_bytes().len();
let mut e = read::GzEncoder::new(l.as_bytes(), Compression::default());
let q = e.read(&mut _buf).unwrap();
println!("raw = {}, zip = {}", p, q);
}
}
I suspect that calling GzEncoder::new in every iteration might be expensive and want to move it outside the loop. How do I do that using flate2?

Remove Adjacent Duplicates in string slice

I have a problem statement to write an in-place function to eliminate the adjacent duplicates in a string slice.
I came up with the following code
func main() {
tempData := []string{"abc", "abc", "abc", "def", "def", "ghi"}
removeAdjacentDuplicates(tempData)
fmt.Println(tempData)
}
func removeAdjacentDuplicates(data []string) {
for j := 1; j < len(data); {
if data[j-1] == data[j] {
data = append(data[:j], data[j+1:]...)
} else {
j++
}
}
fmt.Println(data)
}
The output is following
[abc def ghi]
[abc def ghi ghi ghi ghi]
My doubt is, if in the function, slice is modified, then in the calling function, why isn't the slice giving correct results?
Also, any article to understand the slices (and the underlying array) much better would be very helpful.
The func removeAdjacentDuplicate takes the slice "as if" it is a reference to tempData
The capacity and length of tempData in the main() stays the same for the lifetime
of the program
In the removeAdjacentDuplicate func each time a dupe is found the final value of "ghi" is moved from the end to the end - 1. So in the memory at the end of the
slice there are repeated "ghi"
When the control returns to the main, the program prints out the now modified
slice tempData. Because it was passed in a similar way to a reference to the
function it is this memory that was modified. The function call did not make a copy of the memory
You can see this behaviour by looking at the cap() and len() as the program runs
package main
import (
"fmt"
)
func main() {
tempData := []string{"abc", "abc", "abc", "def", "def", "ghi"}
removeAdjacentDuplicates(tempData)
fmt.Println(tempData,cap(tempData),len(tempData))
}
func removeAdjacentDuplicates(data []string) {
for j := 1; j < len(data); {
if data[j-1] == data[j] {
data = append(data[:j], data[j+1:]...)
fmt.Println(data,cap(data),len(data))
} else {
j++
}
}
fmt.Println(data, cap(data),len(data))
}
In your code, removeAdjacentDuplicates wants to mutate the slcie passed in argument. This is not really possible.
This function should return the new slice, just like append does.
func removeAdjacentDuplicates(data []string) []string{
for j := 1; j < len(data); {
if data[j-1] == data[j] {
data = append(data[:j], data[j+1:]...)
} else {
j++
}
}
return data
}
If you really want to mutate the argument, it is possible but you need to pass a pointer to a slice *[]string
Go 1.18 and above
You can use slices.Compact exactly for this.
Compact replaces consecutive runs of equal elements with a single copy. This is like the uniq command found on Unix. Compact modifies the contents of the slice s; it does not create a new slice.
func main() {
data := []string{"abc", "abc", "abc", "def", "def", "ghi"}
data = slices.Compact(data)
fmt.Println(data) // [abc def ghi]
}
The package is golang.org/x/exp/slices, which is still experimental. If you don't want to import an exp package, you can copy the source:
func Compact[S ~[]E, E comparable](s S) S {
if len(s) == 0 {
return s
}
i := 1
last := s[0]
for _, v := range s[1:] {
if v != last {
s[i] = v
i++
last = v
}
}
return s[:i]
}
If the slice's elements aren't comparable, use slices.CompactFunc which works the same but takes also a comparator function.
Try this function:
func deleteAdjacentDuplicate(slice []string) []string {
for i := 1; i < len(slice); i++ {
if slice[i-1] == slice[i] {
copy(slice[i:], slice[i+1:]) //copy [4] where there is [3, 4] => [4, 4]
slice = slice[:len(slice)-1] //removes last element
i-- //avoid advancing counter
}
}
return slice
}

Check if Char Array contains special sequence without using string library on Unix in C

Letβ€˜s assume we have a char array and a sequence. Next we would like to check if the char array contains the special sequence WITHOUT <string.h> LIBRARY: if yes -> return true; if no -> return false.
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
for (int i = 0; i < sizeof(Array); i++) {
for (int s = 0; s < sizeof(Sequence); s++) {
if (Array[i] == Sequence[i]) {
// How to check if Sequence is contained ?
}
}
}
return false;
}
// in Main Function
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained) {
printf("Contained\n");
} else {
printf("Not Contained\n");
}
Any ideas, suggestions, websites ... ?
Thanks in advance,
Regards, from βˆ†
The simplest way is the naive search function:
for (i = 0; i < lenS1; i++) {
for (j = 0; j < lenS2; j++) {
if (arr[i] != seq[j]) {
break; // seq is not present in arr at position i!
}
}
if (j == lenS2) {
return true;
}
}
Note that you cannot use sizeof because the value you seek is not known at run time. Sizeof will return the pointer size, so almost certainly always four or eight whatever the strings you use. You need to explicitly calculate the string lengths, which in C is done by knowing that the last character of the string is a zero:
lenS1 = 0;
while (string1[lenS1]) lenS1++;
lenS2 = 0;
while (string2[lenS2]) lenS2++;
An obvious and easy improvement is to limit i between 0 and lenS1 - lenS2, and if lenS1 < lenS2, immediately return false. Obviously if you haven't found "HELLO" in "WELCOME" by the time you've gotten to the 'L', there's no chance of five-character HELLO being ever contained in the four-character remainder COME:
if (lenS1 < lenS2) {
return false; // You will never find "PEACE" in "WAR".
}
lenS1minuslenS2 = lenS1 - lenS2;
for (i = 0; i < lenS1minuslenS2; i++)
Further improvements depend on your use case.
Looking for the same sequence among lots of arrays, looking for different sequences always in the same array, looking for lots of different sequences in lots of different arrays - all call for different optimizations.
The length and distribution of characters within both array and sequence also matter a lot, because if you know that there only are (say) three E's in a long string and you know where they are, and you need to search for HELLO, there's only three places where HELLO might fit. So you needn't scan the whole "WE WISH YOU A MERRY CHRISTMAS, WE WISH YOU A MERRY CHRISTMAS AND A HAPPY NEW YEAR" string. Actually you may notice there are no L's in the array and immediately return false.
A balanced option for an average use case (it does have pathological cases) might be supplied by the Boyer-Moore string matching algorithm (C source and explanation supplied at the link). This has a setup cost, so if you need to look for different short strings within very large texts, it is not a good choice (there is a parallel-search version which is good for some of those cases).
This is not the most efficient algorithm but I do not want to change your code too much.
size_t mystrlen(const char *str)
{
const char *end = str;
while(*end++);
return end - str - 1;
}
bool contains(char *Array, char *Sequence) {
// CONTAINS - Function
bool result = false;
size_t s, i;
size_t arrayLen = mystrlen(Array);
size_t sequenceLen = mystrlen(Sequence);
if(sequenceLen <= arrayLen)
{
for (i = 0; i < arrayLen; i++) {
for (s = 0; s < sequenceLen; s++)
{
if (Array[i + s] != Sequence[s])
{
break;
}
}
if(s == sequenceLen)
{
result = true;
break;
}
}
}
return result;
}
int main()
{
char *Arr = "ABCDEFG";
char *Seq = "AB";
bool contained = contains(Arr, Seq);
if (contained)
{
printf("Contained\n");
}
else
{
printf("Not Contained\n");
}
}
Basically this is strstr
const char* strstrn(const char* orig, const char* pat, int n)
{
const char* it = orig;
do
{
const char* tmp = it;
const char* tmp2 = pat;
if (*tmp == *tmp2) {
while (*tmp == *tmp2 && *tmp != '\0') {
tmp++;
tmp2++;
}
if (n-- == 0)
return it;
}
tmp = it;
tmp2 = pat;
} while (*it++ != '\0');
return NULL;
}
The above returns n matches of substring in a string.

Compare two arrays and return the index of the first appearence

I have a task to do, and I was thinking about it, but I dont come up with the right answer.
In a language of your choosing, write a function that gets a string named str and a string named set.
The function will return the index of the first appearance of any char from set in str.
For example:
str = "hellohellohellohelloistom!"
set = "t98765!"
The function will return 22 (index of '5' in str).
Make sure that time complexity is not larger than the length of both strings - O(m+n)
Assume that the string only contains ASCII characters.
I was thinking about it and I thought about doing it with divide and conquer. I have a base case that is always O(1) and the I divide the problem in smaller problems until I get the answer. The problem is that with that solution the complexity will be O(log n).
The other approax I thought was to make a Set. But I still don't really know how to approach this problem. Any ideas??
This program is written in Swift
let str = "hellohellohellohelloistom!"
let set = "t98765!"
func findFirstAppearance(str : String , set : String) -> Int? {
var index : Int?
mainLoop: for setCharacter in set.characters{
for (indexOfChar,strCharacter) in str.characters.enumerate(){
if strCharacter == setCharacter{
index = indexOfChar
break mainLoop
}
}
}
return index
}
print(findFirstAppearance(str, set: set))
print(findFirstAppearance("helloWorld", set: "546Wo"))
Or another solution with less time consuming
let str = "hellohellohellohelloistom!"
let set = "t98765!"
func findFirstAppearance(str : String , set : String) -> Int? {
var index : Int?
mainLoop: for setCharacter in set.characters{
if let range = str.rangeOfString(String(setCharacter)){
index = str.startIndex.distanceTo(range.startIndex)
break
}
}
return index
}
print(findFirstAppearance(str, set: set))
print(findFirstAppearance("helloWorld", set: "546Wo"))
Note :
if any character is not found then it will return nil
it's case sensitive comparison
Hope this will solve your problem.
Since all the strings involved contain only ASCII characters then using constant memory this can be solved in O(LengthOf(str) + LengthOf(set)).
Here is the code in "C" Language:
//ReturnValues:
//-1 : if no occurrence of any character of set is found in str
//value >=0 : index of character in str.
int FindFirstOccurenceOfAnyCharacterInSet(const char *str, const char *set, int *index_of_set)
{
char hash[256];
int i = 0;
while(i < 256)
{
hash[i] = -1;
++i;
}
i = 0;
while(set[i] != '\0')
{
hash[set[i]] = i;
++i;
}
i = 0;
while(str[i] != '\0')
{
if(hash[str[i]] != -1)
{
*index_of_set = hash[str[i]];
return i;
}
++i;
}
*index_of_set = -1;
return -1;
}
Logic works by recording the position/indexes (which are >=0) of all the characters of set in hash table and then parsing str and checking whether the current character of str is present in hash table.
index_of_set will also report the index of character in set which is found in str. If index_of_set = -1 then no occurrence was found.
Thanks for the help!!
Here is also the code in C#.
Cheers,
public static int FindFirstOccurenceOfAnyCharacterInSet(string str, string set)
{
var hash = new int[256];
int i = 0;
while (i < 256)
{
hash[i] = -1;
++i;
}
i = 0;
do
{
hash[set[i]] = i;
++i;
} while (set[i] != '\0' && i < set.Length - 1);
i = 0;
while (str[i] != '\0')
{
if (hash[str[i]] != -1)
{
return i;
}
++i;
}
return -1;
}

How to modify the last item of an array?

Since arr is borrowed as mutable, the length of arr can't be gotten by calling len(). I'm stuck here, what's the right way to do it?
fn double_last(arr: &mut[i32]) -> &i32 {
let last = &mut arr[arr.len() - 1]; // borrow checker error.
//let last = &mut arr[3]; // fine
*last *= 2;
last
}
fn main() {
let mut a = [1,2,3,4];
println!("{}", double_last(&mut a));
println!("{:?}", a);
}
If you only need the last, you can use std::slice::last_mut
fn double_last(arr: &mut[i32]) -> &i32 {
let last = arr.last_mut().unwrap();
*last *= 2;
last
}
This will hopefully be fixed with the introduction of non-lexical lifetimes and the accompanying changes soon into the future (seems like it could be solved?).
For now though, you can satisfy the borrow checker by splitting that calculation out:
let n = arr.len() - 1;
let last = &mut arr[n];

Resources