How to know how much disk space a table occupies?

How to know how much disk space a table occupies? - database

code as follow
db=database("dfs://db1",VALUE,1 2 3)
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
t = table(timestamp, sym, qty, price)
dt=db.createTable(t,`dt).append!(t)
How much disk space does this table DT consume?

You can use the getTabletsMeta function to query the disk space usage of a partition table. The code is as below, and the unit of the return value is Byte:
def diskUsage(database, table){
return select sum(diskUsage) from getTabletsMeta("/"+database+"/%", table, true, -1);
}
pnodeRun(diskUsage{"db1", "t1"})

Related

HOW TO FILTER DJANGO QUERYSETS WITH MULTIPLE AGGREGATIONS

Lets say I have a django model table
class Table(models.Model):
name = models.CharField()
date_created = models.DatetimeField()
total_sales = models.DecimalField()
some data for context
Name
date-created
total-sales
a
2020-01-01
200
b
2020-02-01
300
c
2020-04-01
400
*
**********
***
c
2020-12-01
1000
c
2020-12-12
500
now I want to filter an aggregate of
total_yearly_sales = 10500
current month being December
total_monthly_sales = 1500
daily_sales
total_daily_sales = 500
also do a Group by by name
models.Table.objects.values('Name').annotate(Sum('total-sales')).order_by()
I want to do this in one query(one db hit)
Hence the query should generate
total_yearly_sales
total_monthly_sales
total_daily_sales
total_sales_grouped_by_name ie {a:200, b:300, c:1900}
I know this is too much to ask. Hence let me express my immense gratitude and thanks for having a look at this.
cheers
The above queries I can generate them individually like so
today = timezone.now().date()
todays_sales = models.Table.filter(date_created__date__gte=today, date_created___date__lte=today).aggregate(Sum('total_sales'))
=> 500
monthly_sales(this month) = models.Table.objects.filter(date_created__year=today.year, date_created__month=today.month).aggregate(Sum('total_sales'))
=>10500
total_yearly_sales = models.Table.objects.filter(date_created__year=today.year).aggregate(Sum('total_sales')) => 10500

Bybit API - How do I calculate qty in USDT Perpetual with leverage

I'm using bybit-api to create a conditional order but don't know how do I calculate quantity. Is it based on leveraged amount or original?
for example
I have balance of 50 USDT and want to use 100% per trade with following conditions.
BTC at price 44,089.50 with 50x leverage.
SHIB at price 0.030810 with 50x leverage.
How do I calculate the qty parameter?
https://bybit-exchange.github.io/docs/linear/#t-placecond

I trade Bitcoin through USDT Perpetuals (BTCUSDT). I've setup my own python API and created my own function to calculate quantity for cross margin:
def order_quantity(self, price:float, currency:str='USDT', leverage:float=50.0):
margin = self.get_wallet_balance(currency)
instrument = Instrument(self.query_instrument()[0], 'bybit')
if not price: # Market orders
last_trade = self.ws_get_last_trade() # private function to get last trade
lastprice = float(last_trade[-1]['price'])
else: # Limit orders
lastprice = price
totalbtc = float(margin[currency]['available_balance']) * (1 - instrument.maker_fee * leverage)
rawbtc = totalbtc / lastprice
btc = math.floor(rawbtc / instrument.lot_size) * instrument.lot_size
return min(btc,instrument.max_lot_size)

It is based on the leverage amount.
Your quantity should be :
qty = 50 USDT * 50 (leverage) / 44089 (BTC price) = 0.0567 BTC

MAX Partitioning in ETL using SSIS

I have a query in SQL Server where I am using MAX OVER Partition BY.
MAX(Duration) OVER (PARTITION BY [Variable8]
ORDER BY [RouterCallKeySequenceNumber] ASC) AS MaxDuration
I would like to implement this in ETL using SSIS.
To implement, I have tried to implement similar to how we can implement Row Number.
I have added a SORT transformation and sorted by Variable8 and RouterCallKeySequenceNumber and then I have added a Script transformation.
string _variable8 = "";
int _max_duration;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
_max_duration = Row.Duration;
if (Row.Variable8 != _variable8)
{
_max_duration = Row.Duration;
Row.maxduration = _max_duration;
_variable8 = Row.Variable8;
}
else
{
if (Row.Duration >= _max_duration)
{
Row.maxduration = _max_duration;
}
}
}
This is the data that I have -
Variable8 RouterCallKeySequenceNumber Duration
153084-2490 0 265
153084-2490 1 161
153084-2490 2 197
The solution that I need is as below -
Variable8 RouterCallKeySequenceNumber Duration Max Duration
153084-2490 0 265 265
153084-2490 1 161 265
153084-2490 2 197 265
But this does not return the desired value.
I would appreciate if you can provide any help.
Thanks

I have estimated time on a job but when I add the employee's (in this case 2) hours, it will duplicate the estimated

I have estimated time on a job but when I add the employee's (in this case 2) hours, it will duplicate the estimated. I need to divide by the number of results (maybe employee records) to get the correct answer.
SQL pull from database.
SELECT
LaborDtl.JobNum,
LaborDtl.ClockInDate,
LaborDtl.OprSeq,
EmpBasic.Name,
(LaborDtl.LaborHrs) as [TotalHrs],
((JobOper.EstSetHours + JobOper.EstProdHours) / (COUNT (EmpBasic.Name))) as [TotEstHrs],
LaborDtl.ResourceGrpID
FROM Erp.LaborDtl
left outer JOIN Erp.JobOper ON
JobOper.JobNum = LaborDtl.JobNum
AND JobOper.OprSeq = LaborDtl.OprSeq
JOIN Erp.EmpBasic ON
EmpBasic.EmpID = LaborDtl.EmployeeNum
WHERE LaborDtl.Complete = '1'
AND LaborDtl.ClockInDate = '2019-7-1'
AND LaborDtl.ResourceGrpID = '5-XM-C'
AND LaborDtl.JobNum = 'PA16742'
GROUP BY
LaborDtl.JobNum,
LaborDtl.ClockInDate,
LaborDtl.OprSeq,
EmpBasic.Name,
LaborDtl.LaborHrs,
JobOper.EstSetHours,
JobOper.EstProdHours,
LaborDtl.EmployeeNum,
LaborDtl.ResourceGrpID
JobNum ClockInDate OprSeq Name TotalHrs TotEstHrs ResourceGrpID
pa16742 2019-07-01 20 Jerry Adam 1.6300 5.00 5-XM-C
PA16742 2019-07-01 20 Xue Lee 2.68000 5.00 5-XM-C
In this case, the TotEstHrs should be 2.5 on each line.

I think this does what you want:
((JobOper.EstSetHours + JobOper.EstProdHours) / SUM(COUNT(EmpBasic.Name))
OVER ()) as [TotEstHrs],
It adds the count over all the rows and then does the division.

Sort by a key, but value has more than one element using Scala

I'm very new to Scala on Spark and wondering how you might create key value pairs, with the key having more than one element. For example, I have this dataset for baby names:
Year, Name, County, Number
2000, JOHN, KINGS, 50
2000, BOB, KINGS, 40
2000, MARY, NASSAU, 60
2001, JOHN, KINGS, 14
2001, JANE, KINGS, 30
2001, BOB, NASSAU, 45
And I want to find the most frequently occurring for each county, regardless of the year. How might I go about doing that?
I did accomplish this using a loop. Refer to below. But I'm wondering if there is shorter way to do this that utilizes Spark and Scala duality. (i.e. can I decrease computation time?)
val names = sc.textFile("names.csv").map(l => l.split(","))
val uniqueCounty = names.map(x => x(2)).distinct.collect
for (i <- 0 to uniqueCounty.length-1) {
val county = uniqueCounty(i).toString;
val eachCounty = names.filter(x => x(2) == county).map(l => (l(1),l(4))).reduceByKey((a,b) => a + b).sortBy(-_._2);
println("County:" + county + eachCounty.first)
}

Here is the solution using RDD. I am assuming you need top occurring name per county.
val data = Array((2000, "JOHN", "KINGS", 50),(2000, "BOB", "KINGS", 40),(2000, "MARY", "NASSAU", 60),(2001, "JOHN", "KINGS", 14),(2001, "JANE", "KINGS", 30),(2001, "BOB", "NASSAU", 45))
val rdd = sc.parallelize(data)
//Reduce the uniq values for county/name as combo key
val uniqNamePerCountyRdd = rdd.map(x => ((x._3,x._2),x._4)).reduceByKey(_+_)
// Group names per county.
val countyNameRdd = uniqNamePerCountyRdd.map(x=>(x._1._1,(x._1._2,x._2))).groupByKey()
// Sort and take the top name alone per county
countyNameRdd.mapValues(x => x.toList.sortBy(_._2).take(1)).collect
Output:
res8: Array[(String, List[(String, Int)])] = Array((KINGS,List((JANE,30))), (NASSAU,List((BOB,45))))

You could use the spark-csv and the Dataframe API. If you are using the new version of Spark (2.0) it is slightly different. Spark 2.0 has a native csv data source based on spark-csv.
Use spark-csv to load your csv file into a Dataframe.
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(new File(getClass.getResource("/names.csv").getFile).getAbsolutePath)
df.show
Gives output:
+----+----+------+------+
|Year|Name|County|Number|
+----+----+------+------+
|2000|JOHN| KINGS| 50|
|2000| BOB| KINGS| 40|
|2000|MARY|NASSAU| 60|
|2001|JOHN| KINGS| 14|
|2001|JANE| KINGS| 30|
|2001| BOB|NASSAU| 45|
+----+----+------+------+
DataFrames uses a set of operations for structured data manipulation. You could use some basic operations to become your result.
import org.apache.spark.sql.functions._
df.select("County","Number").groupBy("County").agg(max("Number")).show
Gives output:
+------+-----------+
|County|max(Number)|
+------+-----------+
|NASSAU| 60|
| KINGS| 50|
+------+-----------+
Is this what you are trying to achieve?
Notice the import org.apache.spark.sql.functions._ which is needed for the agg() function.
More information about Dataframes API
EDIT
For correct output:
df.registerTempTable("names")
//there is probably a better query for this
sqlContext.sql("SELECT * FROM (SELECT Name, County,count(1) as Occurrence FROM names GROUP BY Name, County ORDER BY " +
"count(1) DESC) n").groupBy("County", "Name").max("Occurrence").limit(2).show
Gives output:
+------+----+---------------+
|County|Name|max(Occurrence)|
+------+----+---------------+
| KINGS|JOHN| 2|
|NASSAU|MARY| 1|
+------+----+---------------+

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to know how much disk space a table occupies? - database

Related

HOW TO FILTER DJANGO QUERYSETS WITH MULTIPLE AGGREGATIONS

Bybit API - How do I calculate qty in USDT Perpetual with leverage

MAX Partitioning in ETL using SSIS

I have estimated time on a job but when I add the employee's (in this case 2) hours, it will duplicate the estimated

Sort by a key, but value has more than one element using Scala

Categories

Resources