PySpark remove the need for loop - loops

I am fairly new to PySpark and need to work out how to use PySpark correctly and remove the need for this loop which is taking hours to run. I am looping through customer records dataframe then checking to see if an Engineer is available in customers area in the Engineer Dataframe. If there is an Engineer available I do JOBS_LEFT-1 in the Engineer Dataframe. I also mark the customer as having an Engineer assigned.
Appreciate any help
Thanks
engAvail=spark.sql("SELECT EMPLOYEE_NO,concat('#',SECTOR,',') as SECTOR,JOBS_LEFT from tmpEmployeeAvailability WHERE JOBS_LEFT >0 ORDER BY JOBS_LEFT desc")
for index,rows in df_priority_customer.iterrows():
engAvail = engAvail.filter(col("JOBS_LEFT")>0)
pd = "#" + df_priority_customer.loc[index,'POSTAL_DISTRICT'] + ","
empNo= engAvail.filter(col("SECTOR").contains(pd))
if (empNo.count() > 0) :
#print(empNo.count())
#print(empNo.collect()[0][0]) #First Row
engAvail = engAvail.withColumn('JOBS_LEFT', when(engAvail.EMPLOYEE_NO == int(empNo.collect()[0][0]), engAvail.JOBS_LEFT-1).otherwise(engAvail.JOBS_LEFT))
df_priority_customer.at[index,'ASSIGNED_ENGINEER'] = 'Y'
else:
#print('No Engineer')
df_priority_customer.at[index,'ASSIGNED_ENGINEER'] = 'N'

Related

Peewee select query with multiple joins and multiple counts

I've been attempting to write a peewee select query which results in a table with 2 counts (one for the number of prizes associated with the lottery, and the for the number of packages associated with the lottery), as well as the fields in the Lottery model.
I've managed to write select queries with 1 count working (seen below), and then I've had to convert the ModelSelects to lists and join them manually (which I think is very hacky).
I did manage to write a select query where the results were joined, but it would multiply the packages count with the prizes count (I've since lost that query).
I also tried using a .switch(Lottery) but I didn't have any luck with this.
query1 = (Lottery.select(Lottery,fn.count(Package.id).alias('packages'))
.join(LotteryPackage)
.join(Package)
.order_by(Lottery.id)
.group_by(Lottery)
.dicts())
query2 = (Lottery.select(Lottery.id.alias('lotteryID'), fn.count(Prize.id).alias('prizes'))
.join(LotteryPrize)
.join(Prize)
.group_by(Lottery)
.order_by(Lottery.id)
.dicts())
lottery = list(query1)
query3 = list(query2)
for x in range(len(lottery)):
lottery[x]['prizes'] = query3[x]['prizes']
While the above code works, is there a cleaner way to write this query?
Your best bet is to do this with subqueries.
# Create query which gets lottery id and count of packages.
L1 = Lottery.alias()
subq1 = (L1
.select(L1.id, fn.COUNT(LotteryPackage.package).alias('packages'))
.join(LotteryPackage, JOIN.LEFT_OUTER)
.group_by(L1.id))
# Create query which gets lottery id and count of prizes.
L2 = Lottery.alias()
subq2 = (L2
.select(L2.id, fn.COUNT(LotteryPrize.prize).alias('prizes'))
.join(LotteryPrize, JOIN.LEFT_OUTER)
.group_by(L2.id))
# Select from lottery, joining on each subquery and returning
# the counts.
query = (Lottery
.select(Lottery, subq1.c.packages, subq2.c.prizes)
.join(subq1, on=(Lottery.id == subq1.c.id))
.join(subq2, on=(Lottery.id == subq2.c.id))
.order_by(Lottery.name))
for row in query.objects():
print(row.name, row.packages, row.prizes)

Iterate through a list using SQL While Loop

I have written a SQL query that joins two tables and calculates the difference between two variables. I want a while loop iterate through the code and do the procedure for a list of organizations numbers orgnr.
Use Intra;
drop table #Tabell1
go
select Månad = A.Manad, Intrastat = A.varde, Moms = B.vardeutf
into #Tabell1
From
IntrastatFsum A
join
Momsuppg B
on A.Orgnr = B.Orgnr
where A.Orgnr = 165564845492
AND A.Ar = 2017
AND B.Ar = A.AR
--AND A.Manad = 11
AND A.Manad = B.Manad
AND A.InfUtf = 'U'
order by A.Manad
select *, Differens = (Intrastat - Moms)/ Moms from #Tabell1
Where do I put the while loop and how should I write it? I don't have a lot of experience with SQL so any help is appreciated.
wasnt clear at all when posting this question was in a big hurry, so sorry guys for that. But what im trying to do is:
This code just runs trough some data, calculates the difference in % for a special company. Each month we get a list of companies that show high standard deviation for some variables. So we have to go through them and compare the reported value with the taxreturn. What im trying to do is write a code that compares the values for "Intra" which is the reported value and " Moms" which is reported tax value. That part i have already done what i need to do now is to insert a loop into the code that goes through a list instead where i have stored the orgnr for companies that show high std for some variables.I want it to keep a list of the companies where the difference between the reported values and taxreturn is high, so i can take an extra look at them. Because sometimes the taxreturn and reported value to ous is almost the same, so i try to remove the most obvious cases with automation
I don't think you need a loop in your query.
Try changing this where-clause:
where A.Orgnr = 165564845492
To this:
where A.Orgnr in (165564845492, 123, 456, 678)
just remove A.Orgnr = 165564845492 in where clause, there is no need of loop
Use Intra;
drop table #Tabell1
go
select Månad = A.Manad, Intrastat = A.varde,Moms = B.vardeutf
into #Tabell1
From
IntrastatFsum A
join
Momsuppg B
on A.Orgnr = B.Orgnr
where A.Ar = 2017
AND B.Ar = A.AR
--AND A.Manad = 11
AND A.Manad = B.Manad
AND A.InfUtf = 'U'
order by A.Manad
select *, Differens = (Intrastat - Moms)/ Moms from #Tabell1
Question is not clear (to me)
select Månad = A.Manad, Intrastat = A.varde, Moms = B.vardeutf
, Differens = (A.varde - B.vardeutf) / B.vardeutf
From
IntrastatFsum A
join
Momsuppg B
on A.Orgnr = B.Orgnr
AND A.Orgnr in (165564845492, ...)
AND A.Ar = 2017
AND A.Manad = B.Manad
AND A.InfUtf = 'U'
order by A.Manad

how to for each loop on data table within another datatable

I have a web poll application that reads and processes orders from sql server back end and inserts into windss(jda).
the flow
1) I read all the orders from the sql server tables that is not processed
2) then I need to divide the orders into two sets
- set 1 orders with bundle kit( 1 order with several suborders)
- set 2 is just orders with no bundle kit
3) then after each order is processed I need to update the isprocessed sql field to 1
1) code for reading all orders
Frmmain.vb
dt = WebDatabase.GetAllOrdersFromDatabase()**'this method is below**
For Each drow In dt.Rows
If CDbl(WinDSSStoreNumber) = 123 Or CDbl(WinDSSStoreNumber) = 124 Then
' Store 123 and 124 = customer service - only orders with qualifying source codes
my question is:
the above for each checks if the store is 123 or 124 but now I want to implement another for each loop that reads all the InvoiceHeader_Id that the second field in the datatable and then check if it has a bundle kit or not if it does then process it and update the isprocessed field second last in the datatable else process the orders without any bundle kit and update isprocessed for those as well. please any help is generously appreciated, please ask me any questions in the below comment before marking my question down.
WebData.vb
Public Function GetAllOrdersFromDatabase() As DataTable
DrivePath = "C:\Users\somjething\Documents\files\somefiles\"
Dim ds As DataTable = Nothing
WinDSS.connectionString = ConfigurationManager.AppSettings("WinDSS_Connection").Replace("%DrivePath%", DrivePath)
ds = WinDSS.GetSysMst()
If ds.Rows.Count > 0 Then
WinDSSStoreNumber = ds.Rows(0)("store_no")
End If
Dim dt As New DataTable()
Dim conn As New SqlConnection(ConfigurationManager.AppSettings("WebData_Connection") & ConfigurationManager.AppSettings("WebDataSource"))
Dim cmd As SqlCommand
Dim da As New SqlDataAdapter
conn.Open()
cmd = conn.CreateCommand()
cmd.CommandType = CommandType.Text
cmd.CommandText= SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader
WHERE (isprocessed = 0) AND (storetoprocess = N'123')
da.SelectCommand = cmd
da.Fill(dt)
conn.Close()
Return dt
End Function
Additoinal info
This is the information given to me by my manager:
loop throught each below
SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader WHERE (isprocessed = 0) AND (storetoprocess = N'195')
Use the above InvoiceHeader_Id to loop throught all of the order information. Process each Bundle (Kit/)
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo, LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty, OriginalOrderedQty, OtherCharges,
OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty, ReqCancelDate,
ReqDeliveryDate,
ReqShipDate, SCAC, ScacAndService, ScacAndServiceKey, ShipNode, ShipToID, ShipToKey,
StatusQuantity, SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent = 'Y')
ORDER BY PrimeLineNo, SubLineNo
For each record in the above list query below and add as Zero (0) dollar amount. The cost is associated with the information above record (you may have to query tables to ge the values)
Get the OrderLineKey from the above record and query this below to get the associated sub items.
SELECT TOP (100) PERCENT dbo.BundleParentLine.InvoiceHeader_Id AS BPL_InvoiceHeader_Id,
dbo.BundleParentLine.LineDetails_Id AS BPL_LineDetails_Id, dbo.BundleParentLine.LineDetail_Id AS BPL_LineDetail_Id, dbo.BundleParentLine.OrderLine_Id AS BPL_OrderLine_Id, dbo.BundleParentLine.BundleParentLine_id, dbo.BundleParentLine.SubLineNo AS BPL_SubLineNo, dbo.BundleParentLine.PrimeLineNo AS BPL_PrimeLineNo, dbo.BundleParentLine.OrderLineKey AS BPL_OrderLineKey, dbo.OrderLine.*
FROM dbo.BundleParentLine INNER JOIN dbo.OrderLine ON dbo.BundleParentLine.OrderLine_Id = dbo.OrderLine.OrderLine_Id AND dbo.BundleParentLine.LineDetail_Id = dbo.OrderLine.LineDetail_Id AND dbo.BundleParentLine.LineDetails_Id = dbo.OrderLine.LineDetails_Id AND dbo.BundleParentLine.InvoiceHeader_Id = dbo.OrderLine.InvoiceHeader_Id
WHERE (dbo.BundleParentLine.OrderLineKey = N'76873264832') AND (dbo.BundleParentLine.InvoiceHeader_Id = 13) AND (dbo.BundleParentLine.LineDetails_Id = 6) AND (dbo.OrderLine.isprocessed = 0)
ORDER BY BPL_PrimeLineNo, BPL_SubLineNo
After Processed, set isprocessed = 1
Use the above InvoiceHeader_Id to loop throught all of the order information
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo,
LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty,
OriginalOrderedQty,
OtherCharges, OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty,
ReqCancelDate, ReqDeliveryDate, ReqShipDate, SCAC,
ScacAndService,
ScacAndServiceKey, ShipNode, ShipToID, ShipToKey, StatusQuantity,
SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent <> 'Y')
ORDER BY PrimeLineNo, SubLineNo
After Processed, set isprocessed = 1
This is not an answer but I want to pose this to prompt you for some clarification and it's not practical in the comments.
First, close visual studio and open SQL Server Management Studio and connect to the database
We'll build some select statements to get the data back and then we can think about an UPDATE statement
This first query you posted gives you the unprocessed headers for store 195 (straight from your post). I don't know if you want to do this for all stores or not. Paste it into SSMS and run it
SELECT H.*
FROM dbo.InvoiceHeader H
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
Now, this second query gives you the unprocessed lines against that header that have IsBundleParent='Y'.
Look! No loops!
SELECT H.*, L.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have For each record in the above list query below and add as Zero (0) dollar amount which makes no sense to me so perhaps you could clarify
Now we'll add the third query in which adds BundleParentLine (which does have inner joins in it already, but it's a pretty strange list of join fields)
SELECT H.*, L.*, BPL.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
INNER JOIN
dbo.BundleParentLine BPL
ON BPL.OrderLine_Id = L.OrderLine_Id
AND BPL.LineDetail_Id = L.LineDetail_Id
AND BPL.LineDetails_Id = L.LineDetails_Id
AND BPL.InvoiceHeader_Id = L.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have After Processed, set isprocessed = 1, but what is 'processed' here? Is it just setting it to processed in the table?
To do that you just write an UPDATE statement. Here's one way to do that to update just the header (don't run it though until we understand what you're trying to do). Is the problem here that you want to set header and line to processed at the same time?
UPDATE InvoiceHeader
SET Processed = 1
WHERE isprocessed = 0 AND storetoprocess = N'195'
The fourth query you have seems to account for the IsBundleParent <> 'Y' case
Again, we need to know what add as Zero (0) dollar amount and processed means to go any further.

SQL Server, Having Clause, Where, Aggregate Functions

In my problem which I am trying to solve, there is a performance values table:
Staff PerformanceID Date Percentage
--------------------------------------------------
StaffName1 1 2/15/2016 95
StaffName1 2 2/15/2016 95
StaffName1 1 2/22/2016 100
...
StaffName2 1 2/15/2016 100
StaffName2 2 2/15/2016 100
StaffName2 1 2/22/2016 100
And the SQL statement as follows:
SELECT TOP (10)
tbl_Staff.StaffName,
ROUND(AVG(tbl_StaffPerformancesValues.Percentage), 0) AS AverageRating
FROM
tbl_Staff
INNER JOIN
tbl_AcademicTermsStaff ON tbl_Staff.StaffID = tbl_AcademicTermsStaff.StaffID
INNER JOIN
tbl_StaffPerformancesValues ON tbl_AcademicTermsStaff.StaffID = tbl_StaffPerformancesValues.StaffID
WHERE
(tbl_StaffPerformancesValues.Date >= #DateFrom)
AND (tbl_AcademicTermsStaff.SchoolCode = #SchoolCode)
AND (tbl_AcademicTermsStaff.AcademicTermID = #AcademicTermID)
GROUP BY
tbl_Staff.StaffName
ORDER BY
AverageRating DESC, tbl_Staff.StaffName
What I am trying to do is, from a given date, for instance 02-22-2016,
I want to calculate average performance for each staff member.
The code above gives me average without considering the date filter.
Thank you.
Apart from your join conditions and table names which looks quite complex, One simple question, If you want the results for a particular date then why is the need of having
WHERE tbl_StaffPerformancesValues.Date >= #DateFrom
As you said your query is displaying average results but not for a date instance. Change the above line to WHERE tbl_StaffPerformancesValues.Date = #DateFrom.
Correct me if I am wrong.
Thanks for the replies, the code above, as you all say and as it is also expected is correct.
I intended to have a date filter to see the results from the given date until now.
The code
WHERE tbl_StaffPerformancesValues.Date >= #DateFrom
is correct.
The mistake i found from my coding is, in another block i had the following:
Protected Sub TextBoxDateFrom_Text(sender As Object, e As System.EventArgs) Handles TextBoxDate.PreRender, TextBoxDate.TextChanged
Try
Dim strDate As String = Date.Parse(DatesOfWeekISO8601(2016, WeekOfYearISO8601(Date.Today))).AddDays(-7).ToString("dd/MM/yyyy")
If Not IsPostBack Then
TextBoxDate.Text = strDate
End If
SqlDataSourcePerformances.SelectParameters("DateFrom").DefaultValue = Date.Parse(TextBoxDate.Text, CultureInfo.CreateSpecificCulture("id-ID")).AddDays(-7)
GridViewPerformances.DataBind()
Catch ex As Exception
End Try
End Sub
I, unintentionally, applied .AddDays(-7) twice.
I just noticed it and removed the second .AddDays(-7) from my code.
SqlDataSourcePerformances.SelectParameters("DateFrom").DefaultValue = Date.Parse(TextBoxDate.Text, CultureInfo.CreateSpecificCulture("id-ID"))
Because of that mistake, the SQL code was getting the performance values 14 days before until now. So the average was wrong.
Thanks again.

Update Weight in magento 1.9 on mass

I am trying to create a MySQL statement that I can put in a php script to update the weight on a few thousand products in magento 1.9.
This is the statement I currently have:
UPDATE dp_catalog_product_entity_decimal AS ped JOIN dp_eav_attribute AS ea ON ea.entity_type_id = 10 AND ea.attribute_code = 'weight' AND ped.attribute_id = ea.attribute_id SET ped.value = 8 WHERE ped.entity_id = P1000X3;
It was partially taken from another post so I am not sure if it will work at all but I currently have the error "#1054 - Unknown column 'P1000X3' in 'where clause'"
I am not that great with sql joins and I dont really know the magento databse at all so any help to get this statement to work is much apreciated.
Thanks.
Matt
The correct SQL to update the weight of a product with a specific SKU in Magento is this (replace YOUR-MAGENTO-SKU with the sku of the item you want to update, and the 8 with the weight value).
UPDATE catalog_product_entity_decimal AS cped
JOIN eav_attribute AS ea ON ea.entity_type_id = (SELECT entity_type_id FROM eav_entity_type WHERE entity_type_code = 'catalog_product')
LEFT JOIN catalog_product_entity AS cpe ON cpe.entity_id = cped.entity_id
AND ea.attribute_code = 'weight'
AND cped.attribute_id = ea.attribute_id
SET cped.value = 8
WHERE cpe.sku = 'YOUR-MAGENTO-SKU';
In your case, you have the table prefix dp_, and the product sku P1000X3, so the correct SQL for you would be:
UPDATE dp_catalog_product_entity_decimal AS cped
JOIN dp_eav_attribute AS ea ON ea.entity_type_id = (SELECT entity_type_id FROM dp_eav_entity_type WHERE entity_type_code = 'catalog_product')
LEFT JOIN dp_catalog_product_entity AS cpe ON cpe.entity_id = cped.entity_id
AND ea.attribute_code = 'weight'
AND cped.attribute_id = ea.attribute_id
SET cped.value = 8
WHERE cpe.sku = 'P1000X3';
After running this, be sure to reindex your Magento Indexes.
Do your Magento tables have the prefix dp_? Make sure of this.
Also on this part:
WHERE ped.entity_id = P1000X3;
ped.entity_id will be an integer value (number).
I'm not sure where you got P1000X3, but using that won't work (it's a string value). Even so, strings should be wrapped with a single quotes ', like this:
'P1000X3';

Resources