How can I estimate gas fees before sending a transaction? - pact-lang

I would like to estimate the gas fees for a particular transaction on behalf of my users before sending it.
If I know the current going rate for gas on the Chainweb blockchain, and I know how many units of gas a transaction will consume, then I can multiply them together to provide an estimate of how much it will cost to send a particular transaction.
How do I get those two pieces of information?
First, how can I get the current going rate for gas on Chainweb?
Second, is there a rough way to estimate the number of units of gas a transaction will consume? For example, it costs 21,000 units of gas to transfer Ether from one address to another. How do I determine how many units of gas it will take to transfer KDA from one wallet to another? Or how many units of gas it will take to execute N steps of my contract?

For the going rate I am only certain how to get at the current minimum gas price. It is set in the default config of the Kadena node. Currently: 0.00000001. That has always allowed me to do transactions.
For estimating the gas amount you can use the Pact gas log functionality. Load your contract on a local pact executable and use a .repl file with tests to get a simulation of the gas units cost of the specific contract calls your will perform.
In your repl test script you enclose the contract calls you need to measure with
(env-gas 0)(env-gaslog)
to reset the gas log and start, and
(env-gaslog)
To display the gas units consumed since last reset.
Before you can start logging you need to set the gas model to table and a sufficiently high gas limit.
Assume you are working on the coin contract and you need to know how much units of gas the transaction will consume you can use a test as below:
(env-gasmodel "table")
(env-gaslimit 150000)
(load "fungible-v2.pact")
(load "coin.pact")
(env-gas 0) (env-gaslog)
(create-table coin.coin-table)
(env-gaslog)
If you want to run the above you need to copy the source code of the coin contract (coin.pact) and fungible-v2 standard (fungible-v2.pact) together in a folder with this repl. You can run the above:
$ pact -t test.repl
For full reference: https://pact-language.readthedocs.io/en/stable/pact-functions.html#env-gas

Related

Choosing proper database in AWS when all items must be read from the table

I have an AWS application where DynamoDB is used for most data storage and it works well for most cases. I would like to ask you about one particular case where I feel DynamoDB might not be the best option.
There is a simple table with customers. Each customer can collect virtual coins so each customer has a balance attribute. The balance is managed by 3rd party service keeping up-to-date value and the balance attribute in my table is just a cached version of it. The 3rd party service requires its own id of the customer as an input so customers table contains also this externalId attribute which is used to query the balance.
I need to run the following process once per day:
Update the balance attribute for all customers in a database.
Find all customers with the balance greater than some specified constant value. They need to be sorted by the balance.
Perform some processing for all of the customers - the processing must be performed in proper order - starting from the customer with the greatest balance in descending order (by balance).
Question: which database is the most suitable one for this use case?
My analysis:
In terms of costs it looks to be quite similar, i.e. paying for Compute Units in case of DynamoDB vs paying for hours of micro instances in case of RDS. Not sure though if micro RDS instance is enough for this purpose - I'm going to check it but I guess it should be enough.
In terms of performance - I'm not sure here. It's something I will need to check but wanted to ask you here beforehand. Some analysis from my side:
It involves two scan operations in the case of DynamoDB which
looks like something I really don't want to have. The first scan can be limited to externalId attribute, then balances are queried from 3rd party service and updated in the table. The second scan requires a range key defined for balance attribute to return customers sorted by the balance.
I'm not convinced that any kind of indexes can help here. Basically, there won't be too many read operations of the balance - sometimes it will need to be queried for a single customer using its primary key. The number of reads won't be much greater than number of writes so indexes may slow the process down.
Additional assumptions in case they matter:
There are ca. 500 000 customers in the database, the average size of a single customer is 200 bytes. So the total size of the customers in the database is 100 MB.
I need to repeat step 1 from the above procedure (update the balance of all customers) several times during the day (ca. 20-30 times per day) but the necessity to retrieve sorted data is only once per day.
There is only one application (and one instance of the application) performing the above procedure. Besides that, I need to handle simple CRUD which can read/update other attributes of the customers.
I think people are overly afraid of DynamoDB scan operations. They're bad if used for regular queries but for once-in-a-while bulk operations they're not so bad.
How much does it cost to scan a 100 MB table? That's 25,000 4KB blocks. If doing eventually consistent that's 12,250 read units. If we assume the cost is $0.25 per million (On Demand mode) that's 12,250/1,000,000*$0.25 = $0.003 per full table scan. Want to do it 30 times per day? Costs you less than a dime a day.
The thing to consider is the cost of updating every item in the database. That's 500,000 write units, which if in On Demand at $1.25 per million will be about $0.63 per full table update.
If you can go Provisioned for that duration it'll be cheaper.
Regarding performance, DynamoDB can scan a full table faster than any server-oriented database, because it's supported by potentially thousands of back-end servers operating in parallel. For example, you can do a parallel scan with up to a million segments, each with a client thread reading data in 1 MB chunks. If you write a single-threaded client doing a scan it won't be as fast. It's definitely possible to scan slowly, but it's also possible to scan at speeds that seem ludicrous.
If your table is 100 MB, was created in On Demand mode, has never hit a high water mark to auto-increase capacity (just the starter capacity), and you use a multi-threaded pull with 4+ segments, I predict you'll be done in low single digit seconds.

Which Micrometer metric is better to count database values found?

We want to implement a Grafana dashboard that shows in how many calls to a database a value is found
I´m not sure which Micrometer metric to use:
Counter: Counters report a single metric, a count.
Timer: Measures the frequency
According to that, I would choose the counter, because I just want to know how many times we find a value in the database.
It depends on the information you hope to capture based on the metric. You most likely want a gauge.
You aren't timing anything so a Timer wouldn't be a good fit.
Counter - is used for measuring values that only go up and can be used to calculate rates. For instance, counting requests.
Gauge - is used for measuring values that go up and down. For instance, CPU usage.
If you are counting values in a database result, that number could go up and down (if that table allows deletion). However, if the amount only goes up, using a counter would make sense, and give you the ability to see the growth rate, but that will only work if you can guarantee the number is only going up.

How to generate hourly weather data for 8760 (Entire Year) using PvLib Python

Instede of reading TMY file in to PvLib, I wants to generate weather data using PvLib function, class or modules.
I have found some of function to generate weather forecast using "from pvlib.forecast import GFS, NAM, NDFD, HRRR, RAP" these modules.
Above mention method/algorithm has some limitation. It generate data for limited period. Some of the modules are generating only for 7 days or 1 months.
Also it gives data for 3 hourly time stamp difference.
Is there any possibility to interpolate weather data for entire year using PvLib?
Forecast is generally meant to be used for future prediction, and is limited in time and accuracy inversely proportionate: longer future forecasts have less accuracy, and accuracy decreases the further in the future it is. For example, the forecast for today is more accurate than forecast for tomorrow, and so on. This is the reason that forecast is limited as you are forecasting for seven future days.
Forecast providers as GFS may or may not provide data for historic forecasts; it depends on the provider and their services.
As I remember, GFS gives prediction in old file fashion, so I moved to providers that gives online REST services forecast, as I become first a programmer and then a data scientist and never a meteorologist.
When timeseries period is not in your required period, you can do some resampling. Extra values will be mathematically calculated with some formula that—as long you don't know the original provider's formula—resample formula will be likely different.

Hadoop on periodically generated files

I would like to use Hadoop to process input files which are generated every n minute. How should I approach this problem? For example I have temperature measurements of cities in USA received every 10 minute and I want to compute average temperatures per day per week and month.
PS: So far I have considered Apache Flume to get the readings. Which will get data from multiple servers and write the data periodically to HDFS. From where I can read and process them.
But how can I avoid working on same files again and again?
You should consider a Big Data stream processing platform like Storm (which I'm very familiar with, there are others, though) which might be better suited for the kinds of aggregations and metrics you mention.
Either way, however, you're going to implement something which has the entire set of processed data in a form that makes it very easy to apply the delta of just-gathered data to give you your latest metrics. Another output of this merge is a new set of data to which you'll apply the next hour's data. And so on.

Architecture and pattern for large scale, time series based, aggregation operation

I will try to describe my challenge and operation:
I need to calculate stocks price indices over historical period. For example, I will take 100 stocks and calc their aggregated avg price each second (or even less) for the last year.
I need to create many different indices like this where the stocks are picked dynamically out of 30,000~ different instruments.
The main consideration is speed. I need to output a few months of this kind of index as fast as i can.
For that reason, i think a traditional RDBMS are too slow, and so i am looking for a sophisticated and original solution.
Here is something i had In mind, using NoSql or column oriented approach:
Distribute all stocks into some kind of a key value pairs of time:price with matching time rows on all of them. Then use some sort of a map reduce pattern to select only the required stocks and aggregate their prices while reading them line by line.
I would like some feedback on my approach, suggestion for tools and use cases, or suggestion of a completely different design pattern. My guidelines for the solution is price (would like to use open source), ability to handle huge amounts of data and again, fast lookup (I don't care about inserts since it is only made one time and never change)
Update: by fast lookup i don't mean real time, but a reasonably quick operation. Currently it takes me a few minutes to process each day of data, which translates to a few hours per yearly calculation. I want to achieve this within minutes or so.
In the past, I've worked on several projects that involved the storage and processing of time series using different storage techniques (files, RDBMS, NoSQL databases). In all these projects, the essential point was to make sure that the time series samples are stored sequentially on the disk. This made sure reading several thousand consecutive samples was quick.
Since you seem to have a moderate number of time series (approx. 30,000) each having a large number of samples (1 price a second), a simple yet effective approach could be to write each time series into a separate file. Within the file, the prices are ordered by time.
You then need an index for each file so that you can quickly find certain points of time within the file and don't need to read the file from the start when you just need a certain period of time.
With this approach you can take full advantage of today's operating systems which have a large file cache and are optimized for sequential reads (usually reading ahead in the file when they detect a sequential pattern).
Aggregating several time series involves reading a certain period from each of these files into memory, computing the aggregated numbers and writing them somewhere. To fully leverage the operating system, read the full required period of each time series one by one and don't try to read them in parallel. If you need to compute a long period, then don’t break it into smaller periods.
You mention that you have 25,000 prices a day when you reduce them to a single one per second. It seems to me that in such a time series, many consecutive prices would be the same as few instruments are traded (or even priced) more than once a second (unless you only process S&P 500 stocks and their derivatives). So an additional optimization could be to further condense your time series by only storing a new sample when the price has indeed changed.
On a lower level, the time series files could be organized as a binary files consisting of sample runs. Each run starts with the time stamp of the first price and the length of the run. After that, the prices for the several consecutive seconds follow. The file offset of each run could be stored in the index, which could be implemented with a relational DBMS (such as MySQL). This database would also contain all the meta data for each time series.
(Do stay away from memory mapped files. They're slower because they aren’t optimized for sequential access.)
If the scenario you described is the ONLY requirement, then there are "low tech" simple solutions which are cheaper and easier to implement. The first that comes to mind is LogParser. In case you haven't heard of it, it is a tool which runs SQL queries on simple CSV files. It is unbelievably fast - typically around 500K rows/sec, depending on row size and the IO throughput of the HDs.
Dump the raw data into CSVs, run a simple aggregate SQL query via the command line, and you are done. Hard to believe it can be that simple, but it is.
More info about logparser:
Wikipedia
Coding Horror
What you really need is a relational database that has built in time series functionality, IBM released one very recently Informix 11.7 ( note it must be 11.7 to get this feature). What is even better news is that for what you are doing the free version, Informix Innovator-C will be more than adequate.
http://www.freeinformix.com/time-series-presentation-technical.html

Resources