Firebase Firestore Stock Count updating? - database

I am using firebase purely to integrate a simple ticket buying system.
I would think this is a very common scenario people have and wondering what the solutions are.
I have an issue with the write limit time, it means I can't keep the stock count updated.
Due to Firebase's 1 second write limit and the way transactions work, they keep timing out when there is a large buy of tickets at one point in time.
For example:
Let's say we have a simple ticket document like this
{
name: "Taylor Bieber Concert"
stock: 100
price: 1000
}
I use a firebase transaction server side that does (pseudo)
transaction{
ticket = t.get(ticketRef).data() //get the data of ticketRef doc
guard (ticket .stock > 0) else return //check the stock is more than 0
t.update(ticketRef, {stock : increment(-1) }) //update the document and remove 1 stock value
}
The transaction and functionality all works however if I get 20-100 people trying to buy a ticket as it releases, it goes into contention it seems and times out a bunch of the requests...
Is there a way to avoid these timeouts? Some sort of queue or something?
I have tried using Transactions server-side in firebase functions to update the stock value, when many people try to purchase the product simultaneously it leads to majority of the transactions being locked out / Aborted Code 10

Related

Fraud Detection DataStream API tutorial questions

I am following the tutorial here.
Q1: Why in the final application do we clear all states and delete timer whenever flagState = true regardless of the current transaction amount? I refer to this part of the code:
// Check if the flag is set
if (lastTransactionWasSmall != null) {
if (transaction.getAmount() > LARGE_AMOUNT) {
//Output an alert downstream
Alert alert = new Alert();
alert.setId(transaction.getAccountId());
collector.collect(alert);
}
// Clean up our state [WHY HERE?]
cleanUp(context);
}
If the datastream for a transaction was 0.5, 10, 600, then flagState would be set for 0.5 then cleared for 10. So for 600, we skip the code block above and don't check for large amount. But if 0.5 and 600 transactions occurred within a minute, we should have sent an alert but we didn't.
Q2: Why do we use processing time to determine whether two transactions are 1 minute apart? The transaction class has a timeStamp field so isn't it better to use event time? Since processing time will be affected by the speed of the application, so two transactions with event times within 1 minute of each other could be processed > 1 minute apart due to lag.
A1: The fraud model being used in this example is explained by this figure:
In your example, the transaction 600 must immediately follow the transaction for 0.5 to be considered fraud. Because of the intervening transaction for 10, it is not fraud, even if all three transactions occur within a minute. It's just a matter of how the use case was framed.
A2: Doing this with event time would be a very valid choice, but would make the example much more complex. Not only would watermarks be required, but we would also have to sort the stream by event time, since a realistic example would have to consider that the events might be out-of-order.
At that point, implementing this with a process function would no longer be the best choice. Using the temporal pattern matching capabilities of either Flink's CEP library or Flink SQL with MATCH_RECOGNIZE would be the way to go.

Structure data in app engine ndb and speed up query

I am looking for some help as to the best way to structure data in app engine ndb using python, process it and query it later. I want to store temperature data at hourly intervals for different geographical regions.
I can think of two entity options but there maybe something much better. The first would be to store the hourly temperature in individual properties:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
00:00 = ndb.FloatProperty()
01:00 = ndb.FloatProperty()
...
23:00 = ndb.FloatProperty()
Or I could store the data
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
time = ndb.TimeProperty()
temp = ndb.FloatProperty()
(it might be better to store date and time as one property?)
I want to be able to query the datastore to calculate the Total, Max, Min, and average temperature for any given date range. In the first option I could potentially create 4 more properties to effectively pre-process and store the Total, Max etc for each day so if I wanted to query the total temperature for a year I would only have to sum 365 values as opposed to 8760? I'm not sure how I would do this in the second option?
I am relatively new to app engine and datastore and I think I am still thinking in terms of relationship db's so any help would really be appreciated. Later on it might be necessary to store data in different time zones.
Thanks
Paul
Personally, I'd go with a variant of the first approach:
class TempData(ndb.Model):
region = ndb.StringProperty()
date = ndb.DateProperty()
temp = ndb.FloatProperty(repeated=True)
using the temp list to store temperatures by hour in order as you learn about them. I don't think the preprocessing per-date will add anything much: to compute whatever for a year, you'd still need to fetch 365 entities, and the delay for that will swamp the tiny amount of time required to sum up a few thousand numbers anyway.
In general, preprocessing is useful if you want to handily query by the new fields you create by such processing (e.g rapidly answer the question "which dates in locale X had average temperatures greater than 20 Celsius"). That does not seem to be your use case.
If anything, if it's common for you to have to compute many-month values, preprocessing to aggregate things per-month (into simpler TempDataMonth entities) may be more useful. Or, any other several-days period you find useful, of course (weeks, ten-day-groups, whatever). Those could be computed in a background task periodically checking which such periods have become complete since the last check. But, this is a bit beyond your question, so I'm not getting into fine-grained details.
The general idea is that minimizing the number of entities to fetch tends to be the single most important optimization; other optimizations are of course also possible, but, they tend to play second fiddle to that:-).

How to store 10 numbers (updated weekly) with GAE?

My GAE app will request weekly data from Google Analytics like
number of visitors during last week
number of visitors of particular page during last week
etc.
Then I would like to show this data on my GAE web-page with Google Charts. The data will be shown for last X weeks (let's say, 10 weeks).
What is the best approach to store this data (number of metrics multiplied by number of weeks)? Old data could be deleted.
I don't think I should use datastore like:
class Visitors(ndb.Model):
week1 = ndb.IntegerProperty(default=0) # should store week start and end dates also
week2 = ndb.IntegerProperty(default=0)
...
Probably, it would be better to store data like:
class Analytics(ndb.Model):
visitors = ndb.StringProperty(default=0) # comma separated values like '1000,1001,1002'; last value is previous week
page_visitors = ndb.IntegerProperty(repeated=True,default=0) # [1000,1001,1002]
...
What are you trying to optimize?
With this amount of data, you will pay pennies, or less, for data storage. You are well within the free quota on datastore reads and writes. Performance-wise, the difference is negligible.
I would recommend going with the most straightforward solution: each week is a new entity, each data point is in its own property.

How to keep track changing items in a stock portfolio?

I have a system where people can pick some stocks and it values their portfolios but I'm having trouble doing this in a efficient way on a daily basis because I'm creating entries for days that don't have any changes(think of it like I'm measuring the values and having version control so I can track changes to the way the portfolio is designed).
Here's a example(each day's portfolio with stock name and weight):
Day1:
ibm = 10%
microsoft = 50%
google = 40%
day5:
ibm = 20%
microsoft = 20%
google = 40%
cisco = 20%
I can measure the value of the portfolio on day1 and understand I need to measure it again on day5(when it changed) but how do I measure day2-4 without recreating day1's entry in the database?
My approach right now(which I don't like) is to create a temp entry in my database for when someone changes the portfolio and then at the end of the day when I calculate the values if there is a temp entry I use that otherwise I create a new entry(for day2-4) using the last days data. The issue is as data often doesn't change I'm creating entries that are basically duplicates. The catch is: my stock data is all daily. I also thought of taking the portfolio and if it hasn't been updated in 3 days to find the returns of the last 3 days for each stock but I wasn't sure if there was a better solution.
Any ideas? I think this is a straight forward problem but I just can't see a efficient way of doing it.
note: in finance terms, its called creating a NAV and most firms do it the inefficient way I'm doing it but its because the process was created like 50 years ago and hasn't changed. I think this problem is very similar to version control but I can't seem to make a solution.
In storage terms is makes most sense to just store:
UserId - StockId1 - 23% - 2012-06-25
UserId - StockId2 - 11% - 2012-06-26
UserId - StockId1 - 20% - 2012-06-30
So you see that stock 1 went down at 30th. Now if you want to know the StockId1 percentage at the 28th you just select:
SELECT *
FROM stocks
WHERE datecolumn<=DATE(2012-06-28)
ORDER BY datecolumn DESC LIMIT 0,1
If it gives nothing back you did not have it, otherwise you get the last position back.
BTW. if you need for example a graph of stock 1 you could left join against a table full of dates. Then you can fill in the gaps easily.
Found this post here for example:
UPDATE mytable
SET number = (#n := COALESCE(number, #n))
ORDER BY date;
SQL QUERY replace NULL value in a row with a value from the previous known value

Social feed system design

http://twitter.com/#!/ladygaga
When ladygaga tweet 1 message, does it mean to insert 1 data record for EACH of her followers (total 12,221,751)? So totally 12,221,751 records are inserted?
Any clues in designing such a social feed system?
------------------------------- Edit line -------------------------------
Real issue:
Performing SELECT tweet FROM Tweets IN ([FollowingIDs]) is not possible in google app engine, which limiting to a maximum of 30 items in the IN clause
While in app engine it actually means, performing 30 queries in parallel, which is not very wise to do so I guess.
Even if I am allowed to overtake the 30 limits,
what if I am subscribing to 10000 people? I am not sure if there are any performance issues to do it in MYSQL or any other kind of database infrastructure using the "IN clause"
(the bigtable of app engine is different from MYSQL)
So it is better to use the IN clause to query?
or setting up a UserFeed table for storing the feed relationship?
or 3rd method?
Database/SQL guru please help
Please see this talk from Google I/O 2009 to see how to handle these sort of cases on App Engine with a 'fan out' data structure.
Can you imagine it?
no
every people has list of followers
ID | FOLLOWER__ID
ladygaga | genesis
ladygaga | user
//php
$result = mysql_query("SELECT ID FROM followers WHERE FOLLOWER__ID = 'genesis';");
while($row = mysql_fetch_assoc($result)){
$select[] = $row['ID'];
}
$tweets = mysql_query("SELECT * FROM tweets WHERE owner IN (".implode(",", $select).")");

Resources