Padding printed output of tabular data - arrays

I know this is probably dead simple, but I've got some data such as this in one file:
Artichoke
Green Globe, Imperial Star, Violetto
24" deep
Beans, Lima
Bush Baby, Bush Lima, Fordhook, Fordhook 242
12" wide x 8-10" deep
that I'd like to be able to format into a nice TSV type of table, to look something like this:
Name | Varieties | Container Data
----------|------------- |-------
some data here nicely padded with even spacing and right aligned text

Try String#rjust(width):
"hello".rjust(20) #=> " hello"

I wrote a gem to do exactly this: http://tableprintgem.com

No one has mentioned the "coolest" / most compact way -- using the % operator -- for example: "%10s %10s" % [1, 2]. Here is some code:
xs = [
["This code", "is", "indeed"],
["very", "compact", "and"],
["I hope you will", "find", "it helpful!"],
]
m = xs.map { |_| _.length }
xs.each { |_| _.each_with_index { |e, i| s = e.size; m[i] = s if s > m[i] } }
xs.each { |x| puts m.map { |_| "%#{_}s" }.join(" " * 5) % x }
Gives:
This code is indeed
very compact and
I hope you will find it helpful!
Here is the code made more readable:
max_lengths = xs.map { |_| _.length }
xs.each do |x|
x.each_with_index do |e, i|
s = e.size
max_lengths[i] = s if s > max_lengths[i]
end
end
xs.each do |x|
format = max_lengths.map { |_| "%#{_}s" }.join(" " * 5)
puts format % x
end

This is a reasonably full example that assumes the following
Your list of products is contained in a file called veg.txt
Your data is arranged across three lines per record with the fields on consecutive lines
I am a bit of a noob to rails so there are undoubtedly better and more elegant ways to do this
#!/usr/bin/ruby
class Vegetable
##max_name ||= 0
##max_variety ||= 0
##max_container ||= 0
attr_reader :name, :variety, :container
def initialize(name, variety, container)
#name = name
#variety = variety
#container = container
##max_name = set_max(#name.length, ##max_name)
##max_variety = set_max(#variety.length, ##max_variety)
##max_container = set_max(#container.length, ##max_container)
end
def set_max(current, max)
current > max ? current : max
end
def self.max_name
##max_name
end
def self.max_variety
##max_variety
end
def self.max_container()
##max_container
end
end
products = []
File.open("veg.txt") do | file|
while name = file.gets
name = name.strip
variety = file.gets.to_s.strip
container = file.gets.to_s.strip
veg = Vegetable.new(name, variety, container)
products << veg
end
end
format="%#{Vegetable.max_name}s\t%#{Vegetable.max_variety}s\t%#{Vegetable.max_container}s\n"
printf(format, "Name", "Variety", "Container")
printf(format, "----", "-------", "---------")
products.each do |p|
printf(format, p.name, p.variety, p.container)
end
The following sample file
Artichoke
Green Globe, Imperial Star, Violetto
24" deep
Beans, Lima
Bush Baby, Bush Lima, Fordhook, Fordhook 242
12" wide x 8-10" deep
Potatoes
King Edward, Desiree, Jersey Royal
36" wide x 8-10" deep
Produced the following output
Name Variety Container
---- ------- ---------
Artichoke Green Globe, Imperial Star, Violetto 24" deep
Beans, Lima Bush Baby, Bush Lima, Fordhook, Fordhook 242 12" wide x 8-10" deep
Potatoes King Edward, Desiree, Jersey Royal 36" wide x 8-10" deep

another gem: https://github.com/visionmedia/terminal-table
Terminal Table is a fast and simple, yet feature rich ASCII table generator written in Ruby.

I have a little function to print a 2D array as a table. Each row must have the same number of columns for this to work. It's also easy to tweak to your needs.
def print_table(table)
# Calculate widths
widths = []
table.each{|line|
c = 0
line.each{|col|
widths[c] = (widths[c] && widths[c] > col.length) ? widths[c] : col.length
c += 1
}
}
# Indent the last column left.
last = widths.pop()
format = widths.collect{|n| "%#{n}s"}.join(" ")
format += " %-#{last}s\n"
# Print each line.
table.each{|line|
printf format, *line
}
end

Kernel.sprintf should get you started.

Related

Combine arrays with conditions in Ruby

I have a class People with three properties
class People
attr_accessor :first_name, :last_name, :age
end
And I have two arrays:
a = [p1, p2]
b = [p3, p4]
Is there any easy way to combine these two arrays in a new array and remove the item with a condition like:
p1.first_name + p1.last_name == p3.first_name + p3.last_name
And after that all the item should be belong to array a
For example
p1.first_name = "Ada"
p1.last_name = "Wang"
p1.age = 28
p2.first_name = "Leon"
p2.last_name = "S"
p2.age = 28
p3.first_name = "Ada"
p3.last_name = "Wang"
p3.age = 18
p4.first_name = "Mario"
p4.last_name = "M"
p4.age = 80
the result should be [p1] the 28 years old Ada.Wang
I'm not sure I get your point, but maybe this is a possible option.
c = a + b
c.uniq! { |e| e.first_name && e.last_name }
Call Array#uniq! with a block on c which is the concatenation of a and b.
If arrays a and b themselves do not contain people with matching first and last names then this would work:
b.each_with_index do |p, i|
if !(b[i].first_name == a[i].first_name and b[i].last_name == a[i].last_name)
a.push(p) # as people p does not contain the same first/last names as a it can now be added to a
end
end
To check for other fields simply replace first_name / last_name with other variables.

using lookup tables to plot a ggplot and table

I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})

Sort by a key, but value has more than one element using Scala

I'm very new to Scala on Spark and wondering how you might create key value pairs, with the key having more than one element. For example, I have this dataset for baby names:
Year, Name, County, Number
2000, JOHN, KINGS, 50
2000, BOB, KINGS, 40
2000, MARY, NASSAU, 60
2001, JOHN, KINGS, 14
2001, JANE, KINGS, 30
2001, BOB, NASSAU, 45
And I want to find the most frequently occurring for each county, regardless of the year. How might I go about doing that?
I did accomplish this using a loop. Refer to below. But I'm wondering if there is shorter way to do this that utilizes Spark and Scala duality. (i.e. can I decrease computation time?)
val names = sc.textFile("names.csv").map(l => l.split(","))
val uniqueCounty = names.map(x => x(2)).distinct.collect
for (i <- 0 to uniqueCounty.length-1) {
val county = uniqueCounty(i).toString;
val eachCounty = names.filter(x => x(2) == county).map(l => (l(1),l(4))).reduceByKey((a,b) => a + b).sortBy(-_._2);
println("County:" + county + eachCounty.first)
}
Here is the solution using RDD. I am assuming you need top occurring name per county.
val data = Array((2000, "JOHN", "KINGS", 50),(2000, "BOB", "KINGS", 40),(2000, "MARY", "NASSAU", 60),(2001, "JOHN", "KINGS", 14),(2001, "JANE", "KINGS", 30),(2001, "BOB", "NASSAU", 45))
val rdd = sc.parallelize(data)
//Reduce the uniq values for county/name as combo key
val uniqNamePerCountyRdd = rdd.map(x => ((x._3,x._2),x._4)).reduceByKey(_+_)
// Group names per county.
val countyNameRdd = uniqNamePerCountyRdd.map(x=>(x._1._1,(x._1._2,x._2))).groupByKey()
// Sort and take the top name alone per county
countyNameRdd.mapValues(x => x.toList.sortBy(_._2).take(1)).collect
Output:
res8: Array[(String, List[(String, Int)])] = Array((KINGS,List((JANE,30))), (NASSAU,List((BOB,45))))
You could use the spark-csv and the Dataframe API. If you are using the new version of Spark (2.0) it is slightly different. Spark 2.0 has a native csv data source based on spark-csv.
Use spark-csv to load your csv file into a Dataframe.
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(new File(getClass.getResource("/names.csv").getFile).getAbsolutePath)
df.show
Gives output:
+----+----+------+------+
|Year|Name|County|Number|
+----+----+------+------+
|2000|JOHN| KINGS| 50|
|2000| BOB| KINGS| 40|
|2000|MARY|NASSAU| 60|
|2001|JOHN| KINGS| 14|
|2001|JANE| KINGS| 30|
|2001| BOB|NASSAU| 45|
+----+----+------+------+
DataFrames uses a set of operations for structured data manipulation. You could use some basic operations to become your result.
import org.apache.spark.sql.functions._
df.select("County","Number").groupBy("County").agg(max("Number")).show
Gives output:
+------+-----------+
|County|max(Number)|
+------+-----------+
|NASSAU| 60|
| KINGS| 50|
+------+-----------+
Is this what you are trying to achieve?
Notice the import org.apache.spark.sql.functions._ which is needed for the agg() function.
More information about Dataframes API
EDIT
For correct output:
df.registerTempTable("names")
//there is probably a better query for this
sqlContext.sql("SELECT * FROM (SELECT Name, County,count(1) as Occurrence FROM names GROUP BY Name, County ORDER BY " +
"count(1) DESC) n").groupBy("County", "Name").max("Occurrence").limit(2).show
Gives output:
+------+----+---------------+
|County|Name|max(Occurrence)|
+------+----+---------------+
| KINGS|JOHN| 2|
|NASSAU|MARY| 1|
+------+----+---------------+

How to build an array comprised of two others using only particular elements of each?

I writing a little program to generate some bogus top-ten sales numbers for book sales. I'm trying to do this in as compact a fashion as possible and do it without using MySQL or another DB.
I have written out what I want to happen. I've created a bogus catalog array and a bogus sales array corresponding sales to the index of the catalog entries. That part all works great.
I want to create a third array that includes all the titles from the catalog array with the sales numbers from the sales array, like a join in a DB, but without any DB. I can't figure out how to do that part of it though. I think once I have it in there I can sort it the way I want it, but making that third array is killing. I cannot figure out what I'm doing wrong or how to do it right.
So given the following code:
require 'random_word'
class BestOnline
def initialize
#catalog = Array.new
#sales = Array.new
#topten = Array.new
inventory = rand(50) + 10
days = rand(1..50)
now = Time.now
yesterday = now - 86400
saleshistory = now - (days * 86400)
(1..inventory).each do
#catalog << {
:title => "#{RandomWord.adjs.next.capitalize} #{RandomWord.nouns.next.capitalize}",
:price => rand(5.99..29.99).round(2)}
end
(0..days).each do
#sales << {
:id => rand(0..#catalog.count),
:salescount => rand(0..24),
:date => rand(saleshistory..now) }
end
end
def bestsellers
#sales.each do
# THIS DOESNT WORK AND I'M STUCK AS HOW TO FIX IT.
# #topten << {
# :title => #catalog[:id],
# :salescount => #sales[:salescount]
# }
end
puts #topten.group_by{ |tt| tt[:salescount]}.sort_by{ |k,v| -k}.first(10)
end
end
BestOnline.new.bestsellers
How can I create a third array that contains the titles and number of sales and output the result of the top-ten books sold?
Try this out:
def bestsellers
#sales.each do |sale|
#topten << {
title: #catalog[sale[:id]][:title],
salescount: sale[:salescount] }
end
#topten.sort! { |x, y| y[:salescount] <=> x[:salescount] }
puts #topten.first(10)
end
I suggest you write:
def bestsellers(sales)
sales.max_by(10) { |h| h[:salescount][:salescount]] }
end
puts bestsellers(sales)
Enumerable#max_by was permitted to have an argument in Ruby v2.2.
There are several problems with the way you've structured your code. Now that you have running code (by incorporating #fbonds66's answer), I suggest you post it at SO's sister-site Code Review. The purpose of CR is to suggest improvements to working code. If you read through some of the questions and answers there I think you will be impressed.
I was doing the dereferencing wrong trying to build the 3rd array of the 1st two:
#sales.each do |sale|
#topten << {
:title => #catalog[sale[:id]][:title],
:salescount => sale[:salescount]
}
end
I needed to work on the hash returned from .each as |sale| and use correct syntax to get what I was after from the other arrays.

Ruby match string in array

I have a string like that: "Men's Beech River Cable T-Shirt" how can I get category from this string?
str = "Men's Beech River Cable T-Shirt"
str2 = "MEN'S GOOSE EYE MOUNTAIN DOWN VEST"
cat1 = str1.split.last # T-Shirt
cat2 = str2.split.last # VEST
TOPS = %w(jacket vest coat blazer parka sweater shirt polo t-shirt)
Desired result:
category_str1 = "Tops" # Since T-Shirt (shirt) is in TOPS constant.
category_str2 = "Tops" # Since vest is in TOPS const.
I don't know how to describe my problem better, I hope you understand it from example provided.
str = "Men's Beech River Cable T-Shirt"
cat_orig = str.split.last # T-Shirt
TOPS = %w(jacket vest coat blazer parka sweater shirt polo)
RE_TOPS = Regexp.union(TOPS)
category = "Tops" if RE_TOPS =~ cat_orig.downcase
Note there are no comma's in the %w() style array syntax.
str = "Men's Beech River Cable T-Shirt"
cat_orig = str.split.last # T-Shirt
TOPS = %w(jacket vest coat blazer parka sweater shirt polo) # suppressed the comma to get a clean array
category = "Tops" if !cat_orig[/(#{TOPS.join("|")})/i].nil?
The join on the TOPS Array build an alternative regex of the form:
(jacket|vest|coat|blazer|parka|sweater|shirt|polo)
If any of those word is present in cat_orig, the return will be the matched word, if not it will return nil.
Note the leading i in the regex to makes it case insensitive.
The best way to do this is through a hash, not an array. Let's say your caetgories look something like this
categories = { "TOPS" => ["shirt", "coat", "blazer"],
"COOKING" => ["knife", "fork", "pan"] }
We can then loop through each category and find if their values include the word in the string
categories.each do |key, value|
puts key if str.downcase.split(' ').any? { |word| categories[key].include?(word) }
end
Loop through each category, and find if the category has a word that the string has.
Note: This does not yet search for substrings.

Resources