HOW TO FILTER DJANGO QUERYSETS WITH MULTIPLE AGGREGATIONS - django-models

Lets say I have a django model table
class Table(models.Model):
name = models.CharField()
date_created = models.DatetimeField()
total_sales = models.DecimalField()
some data for context
Name
date-created
total-sales
a
2020-01-01
200
b
2020-02-01
300
c
2020-04-01
400
*
**********
***
c
2020-12-01
1000
c
2020-12-12
500
now I want to filter an aggregate of
total_yearly_sales = 10500
current month being December
total_monthly_sales = 1500
daily_sales
total_daily_sales = 500
also do a Group by by name
models.Table.objects.values('Name').annotate(Sum('total-sales')).order_by()
I want to do this in one query(one db hit)
Hence the query should generate
total_yearly_sales
total_monthly_sales
total_daily_sales
total_sales_grouped_by_name ie {a:200, b:300, c:1900}
I know this is too much to ask. Hence let me express my immense gratitude and thanks for having a look at this.
cheers
The above queries I can generate them individually like so
today = timezone.now().date()
todays_sales = models.Table.filter(date_created__date__gte=today, date_created___date__lte=today).aggregate(Sum('total_sales'))
=> 500
monthly_sales(this month) = models.Table.objects.filter(date_created__year=today.year, date_created__month=today.month).aggregate(Sum('total_sales'))
=>10500
total_yearly_sales = models.Table.objects.filter(date_created__year=today.year).aggregate(Sum('total_sales')) => 10500

Related

Webscraping to a DataFrame

I am trying to get information from a website, and into a Dataframe, but I'm having some trouble.
I have extracted the data, but I'm trying to merge two dataframes, and reshape them into one. Here is what I have:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.civilaviation.gov.in/'
resp = requests.get(url)
soup = BeautifulSoup(resp.content.decode(), 'html.parser')
div = soup.find('div', {'class':'airport-col vande-bharat-col'})
div2 = soup.find('div', {'class':'airport-col airport-widget'})
div['class'] = 'Domestic traffic'
div2['class'] = 'International traffic'
dom = div.get_text()
intl = div2.get_text()
def str2frame(estr, sep = '\n', lineterm = '\n\n\n\n\n', set_header = True):
dat = [x.split(sep) for x in estr.split(lineterm)][0:-1]
df = pd.DataFrame(dat)
if set_header:
df = df.T.set_index(0, drop = True).T # flip, set ix, flip back
return df
df1 = str2frame(dom)
df2 = str2frame(intl)
df1.rename(columns={"अन्तर्देशीय यातायात Domestic traffic On 29 Jan 2023":"Domestic Traffic"}, inplace=True)
df2.rename(columns={"अंतर्राष्ट्रीय यातायात International traffic On 29 Jan 2023":"International Traffic"}, inplace=True)
So now I get two separate DataFrames with all the information I want, but not in the format I want. The shape of my dataframes are 6,2(one of the columns is blank)... I need them merged into one dataframe that is 2,6. So basically I show
Domestic Traffic
1 Departing flights 2,967
2 Departing Pax 4,24,224
3 Arriving flights 2,960
4 Arriving Pax 4,18,697
5 Aircraft movements 5,927
6 Airport footfalls 8,42,921
I would like to see two rows, one for domestic and one for international traffic, and each column based on the given values. I apologize if my question or my coding is unclear. This is my first time asking a question on this forum. Thank you for your help.
Not sure if this is the expected result but you could transform and concat your data:
pd.concat([
df1.set_index(df1.columns[0]).T,
df2.set_index(df2.columns[0]).T
]).reset_index()
Output
0
Departing flights
Departing Pax
Arriving flights
Arriving Pax
Aircraft movements
Airport footfalls
0
अन्तर्देशीय यातायात Domestic traffic On 30 Jan 2023
2,862
4,07,957
2,864
4,04,799
5,726
8,12,756
1
अंतर्राष्ट्रीय यातायात International traffic On 30 Jan 2023
433
90,957
516
82,036
949
1,72,993

Bybit API - How do I calculate qty in USDT Perpetual with leverage

I'm using bybit-api to create a conditional order but don't know how do I calculate quantity. Is it based on leveraged amount or original?
for example
I have balance of 50 USDT and want to use 100% per trade with following conditions.
BTC at price 44,089.50 with 50x leverage.
SHIB at price 0.030810 with 50x leverage.
How do I calculate the qty parameter?
https://bybit-exchange.github.io/docs/linear/#t-placecond
I trade Bitcoin through USDT Perpetuals (BTCUSDT). I've setup my own python API and created my own function to calculate quantity for cross margin:
def order_quantity(self, price:float, currency:str='USDT', leverage:float=50.0):
margin = self.get_wallet_balance(currency)
instrument = Instrument(self.query_instrument()[0], 'bybit')
if not price: # Market orders
last_trade = self.ws_get_last_trade() # private function to get last trade
lastprice = float(last_trade[-1]['price'])
else: # Limit orders
lastprice = price
totalbtc = float(margin[currency]['available_balance']) * (1 - instrument.maker_fee * leverage)
rawbtc = totalbtc / lastprice
btc = math.floor(rawbtc / instrument.lot_size) * instrument.lot_size
return min(btc,instrument.max_lot_size)
It is based on the leverage amount.
Your quantity should be :
qty = 50 USDT * 50 (leverage) / 44089 (BTC price) = 0.0567 BTC

I have estimated time on a job but when I add the employee's (in this case 2) hours, it will duplicate the estimated

I have estimated time on a job but when I add the employee's (in this case 2) hours, it will duplicate the estimated. I need to divide by the number of results (maybe employee records) to get the correct answer.
SQL pull from database.
SELECT
LaborDtl.JobNum,
LaborDtl.ClockInDate,
LaborDtl.OprSeq,
EmpBasic.Name,
(LaborDtl.LaborHrs) as [TotalHrs],
((JobOper.EstSetHours + JobOper.EstProdHours) / (COUNT (EmpBasic.Name))) as [TotEstHrs],
LaborDtl.ResourceGrpID
FROM Erp.LaborDtl
left outer JOIN Erp.JobOper ON
JobOper.JobNum = LaborDtl.JobNum
AND JobOper.OprSeq = LaborDtl.OprSeq
JOIN Erp.EmpBasic ON
EmpBasic.EmpID = LaborDtl.EmployeeNum
WHERE LaborDtl.Complete = '1'
AND LaborDtl.ClockInDate = '2019-7-1'
AND LaborDtl.ResourceGrpID = '5-XM-C'
AND LaborDtl.JobNum = 'PA16742'
GROUP BY
LaborDtl.JobNum,
LaborDtl.ClockInDate,
LaborDtl.OprSeq,
EmpBasic.Name,
LaborDtl.LaborHrs,
JobOper.EstSetHours,
JobOper.EstProdHours,
LaborDtl.EmployeeNum,
LaborDtl.ResourceGrpID
JobNum ClockInDate OprSeq Name TotalHrs TotEstHrs ResourceGrpID
pa16742 2019-07-01 20 Jerry Adam 1.6300 5.00 5-XM-C
PA16742 2019-07-01 20 Xue Lee 2.68000 5.00 5-XM-C
In this case, the TotEstHrs should be 2.5 on each line.
I think this does what you want:
((JobOper.EstSetHours + JobOper.EstProdHours) / SUM(COUNT(EmpBasic.Name))
OVER ()) as [TotEstHrs],
It adds the count over all the rows and then does the division.

Sort by a key, but value has more than one element using Scala

I'm very new to Scala on Spark and wondering how you might create key value pairs, with the key having more than one element. For example, I have this dataset for baby names:
Year, Name, County, Number
2000, JOHN, KINGS, 50
2000, BOB, KINGS, 40
2000, MARY, NASSAU, 60
2001, JOHN, KINGS, 14
2001, JANE, KINGS, 30
2001, BOB, NASSAU, 45
And I want to find the most frequently occurring for each county, regardless of the year. How might I go about doing that?
I did accomplish this using a loop. Refer to below. But I'm wondering if there is shorter way to do this that utilizes Spark and Scala duality. (i.e. can I decrease computation time?)
val names = sc.textFile("names.csv").map(l => l.split(","))
val uniqueCounty = names.map(x => x(2)).distinct.collect
for (i <- 0 to uniqueCounty.length-1) {
val county = uniqueCounty(i).toString;
val eachCounty = names.filter(x => x(2) == county).map(l => (l(1),l(4))).reduceByKey((a,b) => a + b).sortBy(-_._2);
println("County:" + county + eachCounty.first)
}
Here is the solution using RDD. I am assuming you need top occurring name per county.
val data = Array((2000, "JOHN", "KINGS", 50),(2000, "BOB", "KINGS", 40),(2000, "MARY", "NASSAU", 60),(2001, "JOHN", "KINGS", 14),(2001, "JANE", "KINGS", 30),(2001, "BOB", "NASSAU", 45))
val rdd = sc.parallelize(data)
//Reduce the uniq values for county/name as combo key
val uniqNamePerCountyRdd = rdd.map(x => ((x._3,x._2),x._4)).reduceByKey(_+_)
// Group names per county.
val countyNameRdd = uniqNamePerCountyRdd.map(x=>(x._1._1,(x._1._2,x._2))).groupByKey()
// Sort and take the top name alone per county
countyNameRdd.mapValues(x => x.toList.sortBy(_._2).take(1)).collect
Output:
res8: Array[(String, List[(String, Int)])] = Array((KINGS,List((JANE,30))), (NASSAU,List((BOB,45))))
You could use the spark-csv and the Dataframe API. If you are using the new version of Spark (2.0) it is slightly different. Spark 2.0 has a native csv data source based on spark-csv.
Use spark-csv to load your csv file into a Dataframe.
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(new File(getClass.getResource("/names.csv").getFile).getAbsolutePath)
df.show
Gives output:
+----+----+------+------+
|Year|Name|County|Number|
+----+----+------+------+
|2000|JOHN| KINGS| 50|
|2000| BOB| KINGS| 40|
|2000|MARY|NASSAU| 60|
|2001|JOHN| KINGS| 14|
|2001|JANE| KINGS| 30|
|2001| BOB|NASSAU| 45|
+----+----+------+------+
DataFrames uses a set of operations for structured data manipulation. You could use some basic operations to become your result.
import org.apache.spark.sql.functions._
df.select("County","Number").groupBy("County").agg(max("Number")).show
Gives output:
+------+-----------+
|County|max(Number)|
+------+-----------+
|NASSAU| 60|
| KINGS| 50|
+------+-----------+
Is this what you are trying to achieve?
Notice the import org.apache.spark.sql.functions._ which is needed for the agg() function.
More information about Dataframes API
EDIT
For correct output:
df.registerTempTable("names")
//there is probably a better query for this
sqlContext.sql("SELECT * FROM (SELECT Name, County,count(1) as Occurrence FROM names GROUP BY Name, County ORDER BY " +
"count(1) DESC) n").groupBy("County", "Name").max("Occurrence").limit(2).show
Gives output:
+------+----+---------------+
|County|Name|max(Occurrence)|
+------+----+---------------+
| KINGS|JOHN| 2|
|NASSAU|MARY| 1|
+------+----+---------------+

How to loop through table based on unique date in MATLAB

I have this table named BondData which contains the following:
Settlement Maturity Price Coupon
8/27/2016 1/12/2017 106.901 9.250
8/27/2019 1/27/2017 104.79 7.000
8/28/2016 3/30/2017 106.144 7.500
8/28/2016 4/27/2017 105.847 7.000
8/29/2016 9/4/2017 110.779 9.125
For each day in this table, I am about to perform a certain task which is to assign several values to a variable and perform necessary computations. The logic is like:
do while Settlement is the same
m_settle=current_row_settlement_value
m_maturity=current_row_maturity_value
and so on...
my_computation_here...
end
It's like I wanted to loop through my settlement dates and perform task for as long as the date is the same.
EDIT: Just to clarify my issue, I am implementing Yield Curve fitting using Nelson-Siegel and Svensson models.Here are my codes so far:
function NS_SV_Models()
load bondsdata
BondData=table(Settlement,Maturity,Price,Coupon);
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Bonds.Settle = Settlements(k); % current_row_settlement_value
Bonds.Maturity = BondData.Maturity(rows); % current_row_maturity_value
Bonds.Prices=BondData.Price(rows);
Bonds.Coupon=BondData.Coupon(rows);
Settle = Bonds.Settle;
Maturity = Bonds.Maturity;
CleanPrice = Bonds.Prices;
CouponRate = Bonds.Coupon;
Instruments = [Settle Maturity CleanPrice CouponRate];
Yield = bndyield(CleanPrice,CouponRate,Settle,Maturity);
NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
NSModel.Parameters
SVModel.Parameters
end
end
Again, my main objective is to get each model's parameters (beta0, beta1, beta2, etc.) on a per day basis. I am getting an error in Instruments = [Settle Maturity CleanPrice CouponRate]; because Settle contains only one record (8/27/2016), it's suppose to have two since there are two rows for this date. Also, I noticed that Maturity, CleanPrice and CouponRate contains all records. They should only contain respective data for each day.
Hope I made my issue clearer now. By the way, I am using MATLAB R2015a.
Use categorical array. Here is your function (without its' headline, and all rows I can't run are commented):
BondData = table(datetime(Settlement),datetime(Maturity),Price,Coupon,...
'VariableNames',{'Settlement','Maturity','Price','Coupon'});
BondData.Settlement = categorical(BondData.Settlement);
Settlements = categories(BondData.Settlement); % get all unique Settlement
for k = 1:numel(Settlements)
rows = BondData.Settlement==Settlements(k);
Settle = BondData.Settlement(rows); % current_row_settlement_value
Mature = BondData.Maturity(rows); % current_row_maturity_value
CleanPrice = BondData.Price(rows);
CouponRate = BondData.Coupon(rows);
Instruments = [datenum(char(Settle)) datenum(char(Mature))...
CleanPrice CouponRate];
% Yield = bndyield(CleanPrice,CouponRate,Settle,Mature);
%
% NSModel = IRFunctionCurve.fitNelsonSiegel('Zero',Settlements(k),Instruments);
% SVModel = IRFunctionCurve.fitSvensson('Zero',Settlements(k),Instruments);
%
% NSModel.Parameters
% SVModel.Parameters
end
Keep in mind the following:
You cannot concat different types of variables as you try to do in: Instruments = [Settle Maturity CleanPrice CouponRate];
There is no need in the structure Bond, you don't use it (e.g. Settle = Bonds.Settle;).
Use the relevant functions to convert between a datetime object and string or numbers. For instance, in the code above: datenum(char(Settle)). I don't know what kind of input you need to pass to the following functions.

Resources