I recently was picked to lead a longitudinal LTV model for our analytics dept. The final deliverable will be for external stakeholders, so essentially how the users on our platform (can't specify the company) are providing life time value to our external partners.
We'll be building this model from the ground up. We have nothing in place for this currently, just a sea of data (assume very generic assets, e.g. users, sign ups, user interaction with platform, etc.)
So... where do I even start? I've just been reading random docs on google for the time being. Any specific resources that are good? Are there different LTV methodologies? What's the "best" one (please take that with a grain of salt)?
I know this is an extremely broad topic so any answers even loosely related to LTV will hold significant value. Thanks all
I haven't tried anything yet. Just reading up on a few resources.
First thing you want to do is lay out the reasoning for having LTV. What's it's gonna be used for and by whom. I'll give some examples, but your industry and your business will have to have it tailored to them.
Next, you have series of meetings with all the stakeholders so that they would agree on a good definition for LTV under a tight guidance of someone who understands the data, or at least what dimensions have to influence it and what format it has to be in.
An example would be: you have an app that offers seven products. The first two products are freebies. Another requires an email to get. The fourth product is just one buck per month, the fifth costs a hundred, but one-time payment, the sixths is 20$/month and the final product is an enterprise/b2b level solution.
An arbitrary model would be to have something like:
No products (guests) => LTV = 0
Product 1 => LTV + 1
Product 2 => LTV + 1
Product 3 => LTV + 3
Product 4 => LTV + 10/month of subscription
Product 5 => LTV + 1000
Product 6 => LTV + 200/month of subscription
Product 7 => LTV + 10k/month of subscription
Then the LTV stakeholders, - mainly business owners and PMs refine the model depending on what kinds of analysis they need conducted typically. That basically depends on what and how they report to their executives or the board.
This is if you want to go with a simple integer as an LTV. Most commonly used for weighting users. Going with integer is a very comfortable starting point since it allows for easy mathematical aggregations. Just to make your user-based analysis more robust. Say, you found out that 2% of your users encounter certain issue that blocks them from navigating somewhere or finishing a process. How should it be prioritized? Should it just be ignored? Should it be addressed immediately?
Well that depends on who those users are. If they're just free users or even just guests and the error is not blocking them from product onboarding, then it's worth to get the ticket to the backlog, but realistically it won't get released any time soon if ever.
However, if those users are enterprise customers, then the issue not only has to be hotfixed. It has to be hotfixed immediately. Probably paying overtime to the devs, qa and devops to work till late today.
Generally, LTV should be a user-level dimension. There are implementations of it as a session-level, but it's way more difficult.
From the technical standpoint, LTV is most commonly implemented on the tracking stage, so commonly in a TMS, say, GTM by a tracking specialist.
Another way it's implemented is in or after ETL, by the data engineers or data scientists.
My project looks like this: my data set is a bunch of profiles of people, with various attributes, e.g. boolean hasJob and int healthScore, and their income. Using this data, I'm trying to predict their income for the future. Each profile also has a history: e.g., what their attributes and income were in the past.
So in essence I'm trying to map multiple sets of (x booleans, y numbers) to a number (salary in the coming year).
I've considered neural networks, Bayes nets, and genetic algorithms for function-fitting. Any suggestions or input?
Thanks in advance!
--Emily
What you want to do is called "time series modeling". However you probably have only very little data per series (per person). I think it is difficult to find one model that fits every person as you make some general assumptions that e.g. everyone is equally career oriented. Also this is such a noisy target, it could be that e.g. you have to take into account if someone is a sweettalker or not. How do you measure such a thing? I'm pretty sure your current attributes have enough noise that will make it difficult to predict anything. When you say health status, do you mean physical health only or mental health. In different businesses different things are important. What about the business or industry they are working in? Its health and growth potential? I would assume this highly influences their income. I also think that you have dependent variables as well as attributes could (and likely are) influenced by your target variable. E. g. people with higher income have better health. It sounds like a very very complex and difficult thing and definitely nothing where "I naively grouped my data and tried a bunch of methods" is going to give meaningful results. I would suggest to learn more about time series modeling and especially also about the data that you have. Maybe try starting out with clustering persons by their initial attributes and see how they develop. Are there any variables that correlate with this development?
What is your research question?
recently i've been given an assignment to work on a new project in c++...and i've been doing some thinking for some days and nights on how to approach it....unfortunately the thinking hasn't worked out and currently struggling on the designing phase. Basically the premise is:
"The management of Ruddles, a well known local department store has decided to implement certain changes to the customer tills throughout the store. Because of problems with staff in providing customers with correct change for their purchases, in future the change will be calculated automatically from the price of the goods and the amount tendered by the customer. The coins will be dispensed automatically by the tills, which are about to be replaced with machines capable of mechanical dispensing."
this is just some information about the business itself. the following is the requirement specification given to me:
"As at present, the member of staff (the cashier) will enter the purchases, and the system will calculate the total cost. This part of the system currently operates satisfactorily, and no changes (!) are envisaged. The total cost will be passed to the new machine, and the amount tendered by the customer will also be entered.
The system will then calculate the amount of change due, and will provide the hardware interface with a list of coin denominations and the number of each to be dispensed.
This constitutes the first part of the exercise. However, management has just realised that the coin dispensing machines will need to be replenished with coins from time to time, and would like the system to be able to keep track of the numbers of coins of different denominations, and issue warnings when the number of coins of any given denomination falls below a certain value. You should therefore include facilities to provide such provision in your solution.
At the present time, the hardware is unable to dispense notes, but future developments may make it possible, and your solution should be such as to cope with this should theses improvements occur. You should therefore indicate what changes would be necessary to the software in the event that such hardware upgrading takes place."
At the moment im really unsure on how to approach the following requirements for the project so im just wondering if any of you who are far more knowledgeable and experienced than me can lend some advice or suggestions, this will be much appreciated for your time and attention:-)
Im aware that there may be multiple ways this can be approached but so far i understand that arrays need to be used and a persistence structure as well although im not entirely sure..
fortunately it doesn't have to be overly completed so long as it does the job. again i appreciate any advice or tips you can give me. thank you.
The described "software" has inputs: the total purchase, and the total tendered by the customer, and outputs: a list of coin denominations and the number of each coin. Sounds like the definition of a function to me. Figure out a good data type for each parameter, and write out the signature of your function.
If I were you, start simple. Pretend you've only got one coin type, and come up with something that will work for that. Then, extend the design to two coins of different values. After that you'll be getting a far better feel for what is needed, and you should be able to see how to extend your solution to cope with coins that have a number of different denominations.
And also forget about the persistence bit until you've got some of the other basics working.
If you try and design the whole thing in your head without actually implementing any code, your brain might explode. So, break the problem down into chunks you can manage.
It sounds like a fun problem, good luck!
I'm looking to build an AI system to "pick" a fantasy football team. I have only basic knowledge of AI techniques (especially when it comes to game theory), so I am looking for advice on what techniques could be used to accomplish this and pointers to some reading materials.
I am aware that this may be a very difficult or maybe even impossible task for AI to accurately complete: however I am not too concerned on the accuracy, rather I am interested in learning some AI and this seems like a fun way to apply it.
Some basic facts about the game:
A team of 14 players must be picked
There is a limit on the total cost of players picked
The players picked must adhere to a certain configuration (there must always be one goalkeeper, at least two defenders, one midfielder and one forward)
The team may be altered on a weekly basis but removing/adding more than one player a week will inccur a penalty
P.S. I have stats on every match played in last season, could this be used to train the AI system?
This is interesting.
So if you didn't really care about accuracy at all, you could just come up with some heuristic for the quality of a team. For instance, assign a point value to each player and then try to maximize it using dynamic programming. Something like: http://www.cse.unl.edu/~goddard/Courses/CSCE310J/Lectures/Lecture8-DynamicProgramming.pdf
This would be similar to the knapsack problem.
Technically this is AI since a computer is deciding something but maybe not what you had in mind.
You sound like you want a learning AI (http://en.wikipedia.org/wiki/Machine_learning) which is an interesting field. Here's how you can approach the problem.
Define your inputs. Right now you have last years data. You'll probably want data on many years. Also, you might be able to include the ranking of pundits, maybe a bunch of magazines rank players or something, that seems useful as well.
Take your inputs and feed them into some machine learning algorithm for each season. Wikipedia will help you out there.
Essentially, for each season you'll want to feed in your data, have your AI pick a team, and then rate the performance of the team based on the seasons results.
Keep doing this and maybe your bot will get better at picking teams, and you can apply to this year's data.
(If you only have last year's data, it's okay to train the algorithm with just that but your AI will probably be over trained on that one set and won't be as accurate.)
This was just a sketch of how it might look. For a romp into AI, this problem is probably pretty hard so don't feel disheartened if it seems overwhelming at first.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is a question not really about "programming" (is not specific to any language or database), but more of design and architecture. It's also a question of the type "What the best way to do X". I hope does no cause to much "religious" controversy.
In the past I have developed systems that in one way or another, keep some form of inventory of items (not relevant what items). Some using languages/DB's that do not support transactions. In those cases I opted not to save item quantity on hand in a field in the item record. Instead the quantity on hand is calculated totaling inventory received - total of inventory sold. This has resulted in almost no discrepancies in inventory because of software. The tables are properly indexed and the performance is good. There is a archiving process in case the amount of record start to affect performance.
Now, few years ago I started working in this company, and I inherited a system that tracks inventory. But the quantity is saved in a field. When an entry is registered, the quantity received is added to the quantity field for the item. When an item is sold, the quantity is subtracted. This has resulted in discrepancies. In my opinion this is not the right approach, but the previous programmers here swear by it.
I would like to know if there is a consensus on what's the right way is to design such system. Also what resources are available, printed or online, to seek guidance on this.
Thanks
I have seen both approaches at my current company and would definitely lean towards the first (calculating totals based on stock transactions).
If you are only storing a total quantity in a field somewhere, you have no idea how you arrived at that number. There is no transactional history and you can end up with problems.
The last system I wrote tracks stock by storing each transaction as a record with a positive or negative quantity. I have found it works very well.
The Data Model Resource Book, Vol. 1: A Library of Universal Data Models for All Enterprises
The Data Model Resource Book, Vol. 2: A Library of Data Models for Specific Industries
The Data Model Resource Book: Universal Patterns for Data Modeling
I have Vol 1 and Vol 2 and these have been pretty helpful in the past.
It depends, inventory systems are about far more than just counting items. For example, for accounting purposes, you might need to know accounting value of inventory based on FIFO (First-in-First-out) model. That can't be calculated by simple "totaling inventory received - total of inventory sold" formula. But their model might calculate this easily, because they modify accounting value as they go. I don't want to go into details because this is not programming issue but if they swear by it, maybe you didn't understand fully all their requirements they have to accommodate.
both are valid, depending on the circumstances. The former is best when the following conditions hold:
the number of items to sum is relatively small
there are few or no exceptional cases to consider (returns, adjustments, et al)
the inventory item quantity is not needed very often
on the other hand, if you have a large number of items, several exceptional cases, and frequent access, it will be more efficient to maintain the item quantity
also note that if your system has discrepancies then it has bugs which should be tracked down and eliminated
i have done systems both ways, and both ways can work just fine - as long as you don't ignore the bugs!
It's important to consider the existing system and the cost and risk of changing it. I work with a database that stores inventory kind of like yours does, but it includes audit cycles and stores adjustments just like receipts. It seems to work well, but everyone involved is well trained, and the warehouse staff aren't exactly quick to learn new procedures.
In your case, if you're looking for a little more tracking without changing the whole db structure then I'd suggest adding a tracking table (kind of like from your 'transaction' solution) and then log changes to the inventory level. It shouldn't be too hard to update most changes to the inventory level so that they also leave a transaction record. You could also add a periodic task to backup the inventory level to the transaction table every couple hours or so so that even if you miss a transaction you can discover when the change happened or roll back to a previous state.
If you want to see how a large application does it take a look at SugarCRM, they have and inventory management module though I'm not sure how it stores the data.
I think this is actually a general best-practices question about doing a (relatively) expensive count every time you need a total vs. doing that count every time something changes, then storing the count in a field and reading that field whenever you need a total.
If I couldn't use transactions, I would go with the live count every time I needed a total. If transactions are available, it would be safe to perform the inventory update operations and the saving of the re-counted total within the same transaction, which would ensure the accuracy of the count (although I'm not sure this would work with multiple users hitting the database).
But if performance is not really a huge problem (and modern databases are good enough at counting rows that I would rarely even worry about this) I'd just stick with the live count each time.
I would opt for the first way, where
the quantity on hand is calculated
totaling inventory received - total of
inventory sold
The Right Way, IMO.
EDIT: I would also want to factor in any stock losses/damages into the system, but I'm sure you have that covered.
I've worked on systems that solve this problem before. I think the ideal solution is a precomputed column, which gets you the best of both worlds. Your total would be a field somewhere, thus no expensive lookups, but it can't get out of sync with the rest of your data (the database maintains the integrity). I don't remember which RDMSs support precomputed columns, but if you don't have transactions, that might not be available either.
You could potentially fake precomputed columns (very effectively... I see no downside) using triggers. You'd probably need transactions though. IMHO, keeping data integrity when you're doing this sort of controlled denormalization is the only legitimate use for a trigger.
Django-inventory geared more to fixed assets, but might give you some ideas.
IE: ItemTemplate (class) -> ItemsOnHand (instance)
ItemsOnHand can be linked to more ItemTemplates; Example Printer & the ink cartridges is requires. This also allows to set Reorder points for each ItemOnHand.
Each ItemsOnHand is linked to InventoryTransactions, this allows for easy auditing.
To avoid calculating actual on hand items from thousand of invetory transactions, checkpoints are used which are just a balance + a date. To calculate items on hand query to find the most recent checkpoint and start adding or substracting items to find the current balance of items. Define new checkpoints periodically.
I can see some benefit to having the two columns, but I'm not following the part about discrepancies - you seem to be implying that having the two columns (in and out) is less prone to discrepancy than a single column (current). Why is that?
Is not having one or two columns, what I meant with "totaling inventory received - total of inventory sold" is something like this:
Select sum(quantity) as inventory_received from Inventory_entry
Select sum(quantity) as inventory_sold from Sales_items
then
Qunatity_on_hand = inventory_received - inventory_sold
Please keep in mind that I oversimplified this and my initial explanation. I know there is much more to inventory that just keeping track of quantities, but in this case that's were the problem lies and what we want to fix. At this point the reason to change it is preciselly the cost of supporting the problems caused by the current design.
Also I wanted to mention that although this is not a "coding" question is related to algoritms and design which IMHO are very important topics.
Thanks everybody for your answers so far.
Nelson Marmol
We solve different problems, but our approach to some of them might be interesting to you.
We allow the system to make a "best guess", and give the users regular feedback about any of those guesses that look wrong.
To apply this to inventory, you could have 3 fields:
inventory_received
inventory_sold
estimated_on_hand
Then, you could run a process (daily?) along the lines of:
SELECT *
FROM Inventory
WHERE estimated_on_hand != inventory_received - inventory_sold
Of course, this relies on users looking at this alert, and doing something about it.
Also, you could have a function to reset inventory some how, either by updating inventory_sold/received, or perhaps adding another field "inventory_adjustment", which could be positive or negative.
... just some thoughts. Hope it's helpful.