How to custom order week year in power bi? - database

I have issue in custom ordering data.
Here is example
I would like to be sorted first by year and than by week
I want 1 2022 to be at the and of the line.
I have tried adding few things.
Is there a way to sort this out by adding new table with weekYear column and some sortOrder column
which I have to populate with custom values as well?
I prefer solution to be dynamic..
If you have any solution please advice!

I suggest
Create a table from your column names
Split the column by space delimiter
Change column types to number
sort by Year and then by Week
Change column types to text
combine the columns with space delimiter
use this new, sorted list to re-order the original columns
I did this as a custom function:
(tbl as table, optional colsToSort as list) =>
let
colHdrs = if colsToSort=null then Table.ColumnNames(tbl) else colsToSort,
hdrCol = Table.FromList(colHdrs,Splitter.SplitTextByDelimiter(" ")),
//type as number for sorting
typeAsNumber = Table.TransformColumnTypes(hdrCol,{{"Column1", Int64.Type},{"Column2",Int64.Type}}),
sorted = Table.Sort(typeAsNumber,{{"Column2", Order.Ascending}, {"Column1", Order.Ascending}}),
//type as text to combine
typeAsText = Table.TransformColumnTypes(sorted,{{"Column1", Text.Type}, {"Column2", Text.Type}}),
combine = Table.CombineColumns(typeAsText,{"Column1","Column2"},Combiner.CombineTextByDelimiter(" "),"orderHeaders"),
//re-Order the column headers in the original table
newOrder = Table.ReorderColumns(tbl,combine[orderHeaders])
in
newOrder
And you could call it from your original query as:
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"41 2021", Int64.Type}, {"42 2021", Int64.Type}, {"43 2021", Int64.Type},
{"46 2021", Int64.Type}, {"1 2022", Int64.Type}, {"48 2021", Int64.Type},
{"50 2021", Int64.Type}, {"49 2021", Int64.Type}, {"51 2021", Int64.Type},
{"52 2021", Int64.Type}}),
//custom function here
sortHdrs = fnSortHeaders(#"Changed Type")
in
sortHdrs
If ALL of your columns are in the format of weeknumber year so that you want to sort ALL of them, it may be simpler to do this without a separate function. The advantage of the separate function is that you can more easily remove the column names you don't want to include in the sort
Here is code that will sort the column headers with no need of a separate function. It uses the same algorithm as above
let
Source = Excel.CurrentWorkbook(){[Name="Table13"]}[Content],
#"Changed Type2" = Table.TransformColumnTypes(Source,{
{"41 2021", Int64.Type}, {"42 2021", Int64.Type}, {"43 2021", Int64.Type},
{"46 2021", Int64.Type}, {"1 2022", Int64.Type}, {"48 2021", Int64.Type},
{"50 2021", Int64.Type}, {"49 2021", Int64.Type}, {"51 2021", Int64.Type},
{"52 2021", Int64.Type}}),
//create header list
//this assumes ALL column headers are to be sorted, but you can make partial lists for the column sorting
hdrs = List.Combine(List.Transform(Table.ColumnNames(#"Changed Type2"), each Text.Split(_," "))),
//split into month and year and put into a table
monthYear = Table.FromColumns({List.Alternate(hdrs,1,1,1),List.Alternate(hdrs,1,1,0)},{"Month","Year"}),
//change data type to number and sort by Year and then Month
#"Changed Type" = Table.TransformColumnTypes(monthYear,{{"Month", Int64.Type}, {"Year", Int64.Type}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Year", Order.Ascending}, {"Month", Order.Ascending}}),
//change data type to text and join with space delimiter
#"Changed Type1" = Table.TransformColumnTypes(#"Sorted Rows",{{"Month", type text}, {"Year", type text}}),
sortedHeaders = Table.CombineColumns(#"Changed Type1",{"Month", "Year"},
Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"newOrder")[newOrder],
//reOrder the original table
sortedTableColumns= Table.ReorderColumns(#"Changed Type2",sortedHeaders)
in
sortedTableColumns

Related

Pyspark array preserving order

I have a structure along these lines, an invoice table and an invoice lines table. I want to output the lines as a JSON ordered array in a mandated schema, ordered by line number but the line number isn't in the schema (it is assumed to be implicit in the array). As I understand it, both pyspark and json will preserve the array order once created. Please see the rough example below. How can I make sure the invoice lines preserve the line number order. I could do it using list comprehension but this means dropping out of spark which I think would be inefficient.
from pyspark.sql.functions import collect_list, struct
invColumns = StructType([
StructField("invoiceNo",StringType(),True),
StructField("invoiceStuff",StringType(),True)
])
invData = [("1", "stuff"), ("2", "other stuff"), ("3", "more stuff")]
invLines = StructType([
StructField("lineNo",IntegerType(),True),
StructField("invoiceNo",StringType(),True),
StructField("detail",StringType(),True),
StructField("quantity",IntegerType(),True)
])
lineData = [(1,"1","item stuff",3),(2,"1","new item stuff",2),(3,"1","old item stuff",5),(1,"2","item stuff",3),(1,"3","item stuff",3),(2,"3","more item stuff",7)]
invoice_df = spark.createDataFrame(data=invData,schema=invColumns)
#in reality read from a spark table
invLine_df = spark.createDataFrame(data=lineData,schema=invLines)
#in reality read from a spark table
invoicesTemp_df = (invoice_df.select('invoiceNo',
'invoiceStuff')
.join(invLine_df.select('lineNo',
'InvoiceNo',
'detail',
'quantity'
),
on='invoiceNo'))
invoicesOut_df = (invoicesTemp_df.withColumn('invoiceLines',struct('detail','quantity'))
.groupBy('invoiceNo','invoiceStuff').agg(collect_list('invoiceLines').alias('invoiceLines'))
.select('invoiceNo',
'invoiceStuff',
'invoiceLines'
))
display(invoicesOut_df)
3 -- more stuff -- array -- 0: -- {"detail": "item stuff", "quantity": 3}
-- 1: -- {"detail": "more item stuff", "quantity": 7}
1 -- stuff -- array -- 0: -- {"detail": "new item stuff", "quantity": 2}
-- 1: -- {"detail": "old item stuff", "quantity": 5}
-- 2: -- {"detail": "item stuff", "quantity": 3}
2 -- other stuff -- array -- 0: -- {"detail": "item stuff", "quantity": 3}
The following, as requested is input data
Invoice Table
"InvoiceNo", "InvoiceStuff",
"1","stuff",
"2","other stuff",
"3","more stuff"
Invoice Lines Table
"LineNo","InvoiceNo","Detail","Quantity",
1,"1","item stuff",3,
2,"1","new item stuff",2,
3,"1","old item stuff",5,
1,"2","item stuff",3,
1,"3","item stuff",3,
2,"3","more item stuff",7
and an output should look like this, but the arrays should be ordered by the line number from the invoice lines table, even though it isn't in the output.
Output
"1","stuff","[{"detail": "item stuff", "quantity": 3},{"detail": "new item stuff", "quantity": 2},{"detail": "old item stuff", "quantity": 5}]",
"2","other stuff","[{"detail": "item stuff", "quantity": 3}]"
"3","more stuff","[{"detail": "item stuff", "quantity": 3},{"detail": "more item stuff", "quantity": 7}]"
collect_list does not respect data's order
Note The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
One possible way to do that is applying collect_list with a window function where you can control the order.
from pyspark.sql import functions as F
from pyspark.sql import Window as W
(invoice_df
.join(invLine_df, on='invoiceNo')
.withColumn('invoiceLines', F.struct('lineNo', 'detail','quantity'))
.withColumn('a', F.collect_list('invoiceLines').over(W.partitionBy('invoiceNo').orderBy('lineNo')))
.groupBy('invoiceNo')
.agg(F.max('a').alias('invoiceLines'))
.show(10, False)
)
+---------+--------------------------------------------------------------------+
|invoiceNo|invoiceLines |
+---------+--------------------------------------------------------------------+
|1 |[{1, item stuff, 3}, {2, new item stuff, 2}, {3, old item stuff, 5}]|
|2 |[{1, item stuff, 3}] |
|3 |[{1, item stuff, 3}, {2, more item stuff, 7}] |
+---------+--------------------------------------------------------------------+

Create an array of months and year in rails

I need to create an array like this
["January 2020, February 2020, March 2020, April 2020, May 2020, June 2020, and so on till last month]
With Date::MONTHNAMES, it enumerizes only the months but I don't find a way to add the years.
Thank you,
You can use map method.
month_names = Date::MONTHNAMES.compact.map{ |m| "#{m} #{Time.zone.now.year}" }
p month_names
#=> ["January 2021", "February 2021", "March 2021", "April 2021",
"May 2021", "June 2021", "July 2021", "August 2021", "September 2021",
"October 2021", "November 2021", "December 2021"]
You can simply map it and add the current year, something like this Date::MONTHNAMES.compact.map{ |month| "#{month} #{Date.current.year}" }
I would go with:
def month_names(year)
1.upto(12).map |month|
Date.new(year, month).strftime("%b %Y")
end
end
While this seems like overkill compared to simple string concatention you can easily swap out strftime for the I18n module to localize it.
def month_names(year)
1.upto(12).map |month|
I18n.localize(Date.new(year, month), format: :long)
end
end
# config/locale/pirate.yml
pirate:
date:
formats:
long: "Aargh! it be the on the fair month of %m %Y"

In a Ruby on Rails app, I'm trying to loop through an array within a hash within an array. Why am I getting "syntax error" message?

I have a Ruby on Rails application to enter results and create a league table for a football competition.
I'm trying to input some results by creating records in the database through heroku and I get error messages.
The application isn't perfectly designed: to enter the results, I have to create the fixtures and enter the score for each team. Then, independently I have to record each goal scorer, creating a record for each goal which is either associated with an existing player or requires me to firstly create a new player and then create the goal.
When I ran the code below heroku, I got this error:
syntax error, unexpected ':', expecting keyword_end
Maybe I'm missing something simple about lopping through an array within a hash?
Thank you for any advice!
coalition = Team.find_by(name: "Coalition")
moscow_rebels = Team.find_by(name: "Moscow Rebels")
red_star = Team.find_by(name: "Red Star")
unsanctionables = Team.find_by(name: "The Unsanctionables")
cavalry = Team.find_by(name: "Cavalry")
galactics = Team.find_by(name: "The Galactics")
happy_sundays = Team.find_by(name: "Happy Sundays")
hardmen = Team.find_by(name: "Hardmen")
international = Team.find_by(name: "International")
evropa = Venue.find_by(name: "Evropa")
s28 = Season.find_by(number: 28)
start_time = DateTime.new(2020,9,6,11,0,0,'+03:00')
scheduled_matches_1 =
[
{team_1: cavalry, team_1_goals: 1, team_1_scorers: ["Minaev"], team_2_goals: 6, team_2_scorers: ["Kovalev", "Kovalev", "Kovalev", "Thomas", "Thomas", "Grivachev"], team_2: coalition, time: start_time, venue: evropa, season: s28},
{team_1: hardmen, team_1_goals: 4, team_1_scorers: ["Jones", "Jones", "Jones", "Fusi"], team_2_goals: 2, team_2_scorers: ["Kazamula", "Ario"], team_2: galactics, time: start_time + 1.hour, venue: evropa, season: s28},
{team_1: international, team_1_goals: 9, team_1_scorers: ["Kimonnen", "Kimonnen", "Kimonnen", "Burya", "Burya", "Zakharyaev", "Zakharyaev", "Lavruk", "Rihter"], team_2_goals: 0, team_2_scorers: [], team_2: happy_sundays, time: start_time+2.hours, venue: evropa, season: s28}
]
scheduled_matches.each do |match|
new_fixture = Fixture.create(time: match[:time], venue: match[:venue], season: match[:season])
tf1 = TeamFixture.create(team: match[:team_1], fixture: new_fixture)
tf2 = TeamFixture.create(team: match[:team_2], fixture: new_fixture)
ts1 = TeamScore.create(team_fixture: tf1, total_goals: match{:team_1_goals})
ts2 = TeamScore.create(team_fixture: tf2, total_goals: match{:team_2_goals})
match[:team_1_scorers].each do |scorer|
if Player.exists?(team: tf1.team, last_name: scorer)
Goal.create(team_score: ts1, player: Player.find_by(last_name: scorer))
else
new_player = Player.create(team: tf1.team, last_name: scorer)
Goal.create(team_score: ts1, player: new_player)
end
end
match[:team_2_scorers].each do |scorer_2|
if Player.exists?(team: tf2.team, last_name: scorer_2)
Goal.create(team_score: ts2, player: Player.find_by(last_name: scorer_2))
else
new_player = Player.create(team: tf2.team, last_name: scorer_2)
Goal.create(team_score: ts2, player: new_player)
end
end
end
It looks like you are using braces when you meant to use brackets to access the hash. Below is one of the issues, but the same issue is in ts2.
ts1 = TeamScore.create(team_fixture: tf1, total_goals: match{:team_1_goals})
should be match[:team_1_goals]
ts1 = TeamScore.create(team_fixture: tf1, total_goals: match[:team_1_goals])
It may be because you have scheduled_matches_1 at the top and scheduled_matches.each do... further down.
But the real issue here is that your variable names match the data content, rather than being used to hold the content. If a new team joins your league, you have to change the code. Next week, you are going to have to change the hard-coded date value. Your scheduled_matches_1 data structure includes the active record objects returned by the first set of Team.findByName() calls. It would be easier to fetch these objects from the database inside your loops, and just hold the team name as a string in the hash.
There is some duplication too. Consider that each fixture has a home team and an away team. Each team has a name, and an array (possibly empty) of the players who scored. We don't need the number of goals; we can just count the number of players in the 'scorers' array. The other attributes, like the location and season belong to the fixture, not the team. So your hash might be better as
{
"fixtures": [
{
"home": {
"name": "Cavalry",
"scorers": [
"Minaev"
]
},
"away": {
"name": "Coalition",
"scorers": [
"Kovalev",
"Kovalev",
"Kovalev",
"Thomas",
"Thomas",
"Grivachev"
]
},
"venue": "Evropa",
"season": "s28"
}
]
}
because then you can create a reusable method to process each team. And maybe create a new method that returns the player (which it either finds or creates) which can be called by the loop that adds the goals.
Also, as it stands, I'm not sure the code can handle 'own goals', either. Perhaps something for a future iteration :)

Optimizing Query in Power Query M

I am trying to get some statistics about my DB and my code seems to work perfect, but I got a real big DB and after trying to run this script on it, I ended up alway with Timeout failure, doesn't matter if I removed some unnecessary rows or not , I still getting the same error and the script is the following :
let
Source = Sql.Database("DBTEST","DB_TST",[CreateNavigationProperties=false]),
#"Filtered Rows" = Table.SelectRows(Source , each ([Kind] = "Table")),
#"Added Custom" = Table.AddColumn( #"Filtered Rows", "Profile",
each Table.Profile([Data])),
#"Expanded Profile" = Table.ExpandTableColumn( #"Added Custom" ,
"Profile",
{"Column", "Min", "Max", "Average", "StandardDeviation", "Count",
"NullCount", "DistinctCount"},
{"Column", "Min", "Max", "Average", "StandardDeviation", "Count",
"NullCount", "DistinctCount"})
in
#"Expanded Profile",
#"Entfernte Spalten" = Table.RemoveColumns(Tables_profile,{"Data"}),
#"Gefilterte Zeilen" = Table.SelectRows(#"Entfernte Spalten", each true)
in
#"Gefilterte Zeilen"
I do it this way.
Sql.Database("Server", "Database", "[Query=" Select * From...", CommandTimeout=#duration(0, 5, 0, 0)])

Azure Search scoring

I have sets of 3 identical (in Text) items in Azure Search varying on Price and Points. Cheaper products with higher points are boosted higher. (Price is boosted more then Points, and is boosted inversely).
However, I keep seeing search results similar to this.
Search is on ‘john milton’.
I get
Product="Id = 2-462109171829-1, Price=116.57, Points= 7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.499783
Product="Id = 2-462109171829-2, Price=116.40, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.454872
Product="Id = 2-462109171829-3, Price=115.64, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=32.316270
I expect the scoring order to be something like this, with the lowest price first.
Product="Id = 2-462109171829-3, Price=115.64, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-2, Price=116.40, Points= 9, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
Product="Id = 2-462109171829-1, Price=116.57, Points= 7, Name=Life of Schamyl / John Milton Mackie, Description=.", Score=
What am I missing or are minor scoring variations acceptable?
The index is defined as
let ProductDataIndex =
let fields =
[|
new Field (
"id",
DataType.String,
IsKey = true,
IsSearchable = true);
new Field (
"culture",
DataType.String,
IsSearchable = true);
new Field (
"gran",
DataType.String,
IsSearchable = true);
new Field (
"name",
DataType.String,
IsSearchable = true);
new Field (
"description",
DataType.String,
IsSearchable = true);
new Field (
"price",
DataType.Double,
IsSortable = true,
IsFilterable = true)
new Field (
"points",
DataType.Int32,
IsSortable = true,
IsFilterable = true)
|]
let weightsText =
new TextWeights(
Weights = ([|
("name", 4.);
("description", 2.)
|]
|> dict))
let priceBoost =
new MagnitudeScoringFunction(
new MagnitudeScoringParameters(
BoostingRangeStart = 1000.0,
BoostingRangeEnd = 0.0,
ShouldBoostBeyondRangeByConstant = true),
"price",
10.0)
let pointsBoost =
new MagnitudeScoringFunction(
new MagnitudeScoringParameters(
BoostingRangeStart = 0.0,
BoostingRangeEnd = 10000000.0,
ShouldBoostBeyondRangeByConstant = true),
"points",
2.0)
let scoringProfileMain =
new ScoringProfile (
"main",
TextWeights =
weightsText,
Functions =
new List<ScoringFunction>(
[
priceBoost :> ScoringFunction
pointsBoost :> ScoringFunction
]),
FunctionAggregation =
ScoringFunctionAggregation.Sum)
new Index
(Name = ProductIndexName
,Fields = fields
,ScoringProfiles = new List<ScoringProfile>(
[
scoringProfileMain
]))
All indexes in Azure Search are split into multiple shards allowing us for quick scale up and scale downs. When a search request is issued, it’s issued against each of the shards independently. The result sets from each of the shards are then merged and ordered by score (if no other ordering is defined). It is important to know that the scoring function weights query term frequency in each document against its frequency in all documents, in the shard!
It means that in your scenario, in which you have three instances of every document, even with scoring profiles disabled, if one of those documents lands on a different shard than the other two, its score will be slightly different. The more data in your index, the smaller the differences will be (more even term distribution). It’s not possible to assume on which shard any given document will be placed.
In general, document score is not the best attribute for ordering documents. It should only give you general sense of document relevance against other documents in the results set. In your scenario, it would be possible to order the results by price and/or points if you marked price and/or points fields as sortable. You can find more information how to use $orderby query parameter here: https://msdn.microsoft.com/en-us/library/azure/dn798927.aspx

Resources