How to use a self made Type in F#? - arrays

I made a type, but I don't know how to use it properly and I don't found any solution on google.
type Sample =
{
TrackPosition : int
TubePosition : int
Barcode : string
}
let arraySamples = Array.create Scenario.Samples.NumberOfSamples **Sample**
BarcodeGenerieren.Samples.Sample
let mutable trackPosition = Scenario.Samples.StartTrackPositions
let mutable index = 1
for i in 1 .. Scenario.Samples.NumberOfSamples do
let randomNumber = System.Random().Next(0,9999)
if index > 24 then
trackPosition <- trackPosition + 1
index <- 1
arraySamples.[index] <- **new Sample{TrackPosition= trackPosition, TubePosition = index, Barcode = sprintf "100%s%06d" ((trackPosition + 1) - Scenario.Samples.StartTrackPositions) randomNumber}**
So my question is, what should I changed so that it works, when I will give the type of the array and when I will give the sample with data to the array?

You have created what is referred to as a record type. You can initialise it with the following syntax
{TrackPosition = 0;TubePosition = 0;Barcode = "string"}
your syntax in the last line is almost correct - it should be
arraySamples.[index] <- Sample{
TrackPosition= trackPosition;
TubePosition = index;
Barcode = sprintf "100%s%06d" ((trackPosition + 1) - Scenario.Samples.StartTrackPositions) randomNumber}
The changes are
Eliminate new
replace , with ;

Related

Brute Force Transposition

Hello i have an assignment that I can find out. The questions for the assignment is:
Make a loop that tries to decrypt the ciphertext with all possible keys one at a time.
For each loop, each individual word is looked up in the dictionary. If 85% of the words are found in the dictionary, then it is probably the right key in the current run, and then the loop must be broken.
Decrypt the text with the found key and print it.
I have a code that takes all words from a dictionary and count them. I have linked the dsv file. Hope you can help me.
import csv
import pickle
import math
orddict = {}
item = 0
with open('alle_dkord.csv', 'r', encoding='utf-8') as file:
reader = csv.reader(file, delimiter=';')
for row in reader:
orddict[row[0].upper()] = row[1]
print(len(orddict))
pkfile = open('wordlist.pkl', 'ab')
pickle.dump(orddict, pkfile)
pkfile.close()
def main():
msg = "This is a cypher text"
kryptmsg = "Ta h ticesyx ptihse r"
key = 8
krypteret_tekst = krypter(key, msg)
print(krypteret_tekst)
dekrypteret_tekst = dekrypter(key, kryptmsg)
print(dekrypteret_tekst)
def krypter(key, msg):
ciffer_string = [""] * key
for kolonne in range(key):
curIndex = kolonne
while curIndex < len(msg):
ciffer_string[kolonne] += msg[curIndex]
curIndex += key
return''.join(ciffer_string)
def dekrypter(key, kryptmsg):
numKolonner = int(math.ceil(len(kryptmsg)/float(key)))
numRows = key
numOfGreyBox = (numKolonner * numRows) - len(kryptmsg)
plaintekst = [''] * numKolonner
kolonne = 0
row = 0
for symbol in kryptmsg:
plaintekst[kolonne] += symbol
kolonne += 1
if (kolonne == numKolonner) or (kolonne == numKolonner - 1 and row >= numRows - numOfGreyBox):
kolonne = 0
row += 1
return ''.join(plaintekst)
if __name__ =='__main__':
main()
The csv file
I have tried to make a loop. But it didn't work

Create array of "deep" struct (scalar) fields

How can I collapse the values of "deep" struct fields into arrays by just indexing?
In the example below, I can only do it for the "top-most" level, and for "deeper" levels I get the error:
"Expected one output from a curly brace or dot indexing expression, but there were XXX results."
The only workaround I found so far is to unfold the operation into several steps, but the deeper the structure the uglier this gets...
clc; clear variables;
% Dummy data
my_struc.points(1).fieldA = 100;
my_struc.points(2).fieldA = 200;
my_struc.points(3).fieldA = 300;
my_struc.points(1).fieldB.subfieldM = 10;
my_struc.points(2).fieldB.subfieldM = 20;
my_struc.points(3).fieldB.subfieldM = 30;
my_struc.points(1).fieldC.subfieldN.subsubfieldZ = 1;
my_struc.points(2).fieldC.subfieldN.subsubfieldZ = 2;
my_struc.points(3).fieldC.subfieldN.subsubfieldZ = 3;
my_struc.info = 'Note my_struc has other fields besides "points"';
% Get all fieldA values by just indexing (this works):
all_fieldA_values = [my_struc.points(:).fieldA]
% Get all subfieldM values by just indexing (doesn't work):
% all_subfieldM_values = [my_struc.points(:).fieldB.subfieldM]
% Ugly workaround:
temp_array_of_structs = [my_struc.points(:).fieldB];
all_subfieldM_values = [temp_array_of_structs.subfieldM]
% Get all subsubfieldZ values by just indexing (doesn't work):
% all_subsubfieldZ_values = [my_struc.points(:).fieldC.subfieldN.subsubfieldZ]
% Ugly workaround:
temp_array_of_structs1 = [my_struc.points(:).fieldC];
temp_array_of_structs2 = [temp_array_of_structs1.subfieldN];
all_subsubfieldZ_values = [temp_array_of_structs2.subsubfieldZ]
Output:
all_fieldA_values =
100 200 300
all_subfieldM_values =
10 20 30
all_subsubfieldZ_values =
1 2 3
Thanks for any help!
You can use arrayfun to have acces to each individual 'point', and then acces its data. This will return an array with the same dimensions as my_struc.points:
all_subfieldM_values = arrayfun(#(in) in.fieldB.subfieldM, my_struc.points)
all_subsubfieldZ_values = arrayfun(#(in) in.fieldC.subfieldN.subsubfieldZ, my_struc.points)
Not optimal, but at least it's one line.

Change an array to an integer percentage value

I have a function that gives me a value in the form of an array as an output when run but I need the output as an integer percentage for a results piece
def pred_datsci(file_path):
prev_precompute = learn.precompute
learn.precompute = False
try:
trn_tfms, val_tfms = tfms_from_model(arch,sz)
test_img = open_image(file_path)
im = val_tfms(test_img)
pred = learn.predict_array(im[None])
class_index = (np.exp(pred))
class_index1 = np.argmax(np.exp(pred))
print(class_index*100)
return data.classes[class_index1]
finally:
learn.precompute = prev_precompute
This is what the output looks like:
pred_datsci(f"data/dogscats1/valid/dogs/12501.jpg")
My question is how do I get these two values to display as :
Cat % = 15.81724%
Dog % = 84.18274%
You can use zip function as:
for z in zip(['Cat', 'Dog'], [15.3, 84.6]):
print('%s %% = %s%%'%(z[0], z[1]))

Scala read only certain parts of file

I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))

Merge random elements of array/split into chunks

How can I split array into chunks with some special algorithm? E.g. I need to shorten array to the size of 10 elements. If I have array of 11 elements, I want two next standing elements get merged. If I have array of 13 elements, I want three elements merged. And so on. Is there any solution?
Sample #1
var test = ['1','2','3','4','5','6','7','8','9','10','11'];
Need result = [['1'],['2'],['3'],['4'],['5|6'],['7'],['8'],['9'],['10'],['11']]
Sample #2
var test = ['1','2','3','4','5','6','7','8','9','10','11','12','13'];
Need result = [['1|2'],['3'],['4'],['5'],['6'],['7|8'],['9'],['10'],['11'],['12|13']]
Thank you in advance.
The following code most probably does what you want.
function condense(a){
var source = a.slice(),
len = a.length,
excessCount = (len - 10) % 10,
step = excessCount - 1 ? Math.floor(10/(excessCount-1)) : 0,
groupSize = Math.floor(len / 10),
template = Array(10).fill()
.map((_,i) => step ? i%step === 0 ? groupSize + 1
: i === 9 ? groupSize + 1
: groupSize
: i === 4 ? groupSize + 1
: groupSize);
return template.map(e => source.splice(0,e)
.reduce((p,c) => p + "|" + c));
}
var test1 = ['1','2','3','4','5','6','7','8','9','10','11'],
test2 = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21'];
console.log(condense(test1));
console.log(condense(test2));
A - Find the difference and create thus many random numbers for merge and put in array
B - loop through initial numbers array.
B1 - if iterator number is in the merge number array (with indexOf), you merge it with the next one and increase iterator (to skip next one as it is merged and already in results array)
B1 example:
int mergers[] = [2, 7, 10]
//in loop when i=2
if (mergers.indexOf(i)>-1) { //true
String newVal = array[i]+"|"+array[i+1]; //will merge 2 and 3 to "2|3"
i++; //adds 1, so i=3. next loop is with i=4
}
C - put new value in results array
You can try this code
jQuery(document).ready(function(){
var test = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16'];
var arrays = [];
var checkLength = test.length;
var getFirstSet = test.slice(0,10);
var getOthers = test.slice(10,checkLength);
$.each( getFirstSet, function( key,value ) {
if(key in getOthers){
values = value +'|'+ getOthers[key];
arrays.push(values);
}else{
arrays.push(value);
}
});
console.log(arrays);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>

Resources