Is there a dataset for products (UPC/EAN level) and their recycling information? - dataset

I am looking to do some analysis around plastic recycling and interested to know if there is any dataset that gives recycling information for products sold in US. For ex: a product with UPC/EAN number has a resin code of 1 (number written at the bottom of a plastic container). If you have any ideas on how to start creating it will be helpful as well. I understand there is something out there that gives information of a general 1 gallon milk container but I am looking at information on a brand/manufacturer level.
Thanks

Related

how can I filter capital letters in a data set

I have a column with a lot of rows more than 150k, each cell has a text, there is some cells has problems of having some sentences with capital letters I wanna fix this issue how can I filter them to know how many of them I have for example I have some cells like that I wanna detect the cells that has some capital sentences to be able to fix them:
Each year, They carefully curate the finest gifts to fill our Baskits
from new and wellloved brands, to unique products made exclusively for
us. They specialize in helping busy professionals give thoughtful,
impactful gifts for business development, colleague/employee
recognition, holiday gifts and more. Here are the top three things to
know about us: 1) They ARE CANADA'S LEADING GIFT DELIVERY SERVICE 30+
years of experience and 20, 000 customers 2) They MAKE THOUGHTFUL
GIFTING QUICK AND EASY Online and Mobile Webstore (Open 24/7) Call
Centre Gift Specialists Two Retail Stores (Downtown + North Toronto)
3) They HAVE DELIVERY OPTIONS TO SUIT YOUR NEEDS Delivery Across North
America sameday (and Saturday) Delivery in the GTA
try in A2:
=INDEX(REGEXMATCH(A2:A; "(.*)[A-Z]{2}(.*)")
or if you want a count:
=SUMPRODUCT(1*REGEXMATCH(A2:A; "(.*)[A-Z]{2}(.*)"))
Since it looks like you're in Google Sheets, just do a REGEXMATCH() for two capital letters in a row (as a review flag):
=BYROW(A2:A, LAMBDA(x, REGEXMATCH(x, "(.*)[A-Z]{2}(.*)"))
The BYROW() makes it a one-liner for the entire column. Ditch that if needed.

CakePHP4 - idea for database across multiple tables

I'm creating a body-wax comparison website for my own project with cakephp4 and I'm stuck with a problem right now and I dunno what to do.
Situation:
There are multiple body-waxing companies, each company has many options for waxing like legs, arms,
beard etc. And each company has different price range for each waxing part.
CompanyA->arms = $60
CompanyA->legs = $30
CompanyB->arms = $50
I already connected to the Companies-table and Parts-table like the image below. Additionally, I came up with the idea of Prices-table too but I'm not sure if it's doable or I need to come up with something else.
Hopefully, I want to edit the price in the companies edit/add pages.
Any help I would appreciate.

How to skip rows with the same values

I have the following problem: I have a dataset with over 1million entries (shown below), that includes the variables company (=Name of the company (string)) and reviews (=amount of reviews a company received) and company1 (assigns numeric to specific company name). Now I want to calculate the average amount of reviews a company in the dataset receives. But if I just do sum reviewsthen it will count the amount of reviews of company 3 two times, the amount of reviews of company five 23 times etc. (as often as they are listed in the data). How do I avoid this and only count them once?
Your image is not readable (by me on a laptop). The Stata tag wiki gives detailed advice on how to give data examples and the command dataex bundled with recent versions of Stata is easily used for SE.
The flavour of your request is easier to follow. Here is an analogue. With the Grunfeld data we can calculate a mean investment for each year.
webuse grunfeld, clear
egen mean = mean(invest), by(year)
Now we might want to know how many years had mean invest above 200 (in the units used)?
su mean if mean > 200
or
count if mean > 200
returns the number of observations (not years). If you try it, the result is 30. In the Grunfeld data, there are 10 companies each measured for each year, so dividing by 10 is an easy answer. For more complicated datasets, it would better to tag each year just once, and then look only at tagged observations:
egen tag = tag(year)
count if tag & mean > 200
It would be more common to tag panels, not years, but the principle is the same. See the help for egen.
collapse and contract offer other routes, with or without using frames.

User search pricing calculation

I'm building a search engine which provide me a list of cap drivers. We have some requirements:
User is searching cheapest cap driver to bring him from place a to place b. He can go from any place to any place.
Default formula would be distance * price per mile
But there are also special prices like AMSTERDAM to THE HAGUE would be always 100 EUR
The price for each mile is season based winter/summers have different prices.
Faceting search based on attributes. Like is there Champagne/Luxory/Male/Female driver/Etc etc.
User want's to sort on cheapest ride/but also distance.
What would be the best approach to fit all there requirements? I've tried Solr but have not found a good solution for putting the price modal in there. Any ideas?

How to keep track changing items in a stock portfolio?

I have a system where people can pick some stocks and it values their portfolios but I'm having trouble doing this in a efficient way on a daily basis because I'm creating entries for days that don't have any changes(think of it like I'm measuring the values and having version control so I can track changes to the way the portfolio is designed).
Here's a example(each day's portfolio with stock name and weight):
Day1:
ibm = 10%
microsoft = 50%
google = 40%
day5:
ibm = 20%
microsoft = 20%
google = 40%
cisco = 20%
I can measure the value of the portfolio on day1 and understand I need to measure it again on day5(when it changed) but how do I measure day2-4 without recreating day1's entry in the database?
My approach right now(which I don't like) is to create a temp entry in my database for when someone changes the portfolio and then at the end of the day when I calculate the values if there is a temp entry I use that otherwise I create a new entry(for day2-4) using the last days data. The issue is as data often doesn't change I'm creating entries that are basically duplicates. The catch is: my stock data is all daily. I also thought of taking the portfolio and if it hasn't been updated in 3 days to find the returns of the last 3 days for each stock but I wasn't sure if there was a better solution.
Any ideas? I think this is a straight forward problem but I just can't see a efficient way of doing it.
note: in finance terms, its called creating a NAV and most firms do it the inefficient way I'm doing it but its because the process was created like 50 years ago and hasn't changed. I think this problem is very similar to version control but I can't seem to make a solution.
In storage terms is makes most sense to just store:
UserId - StockId1 - 23% - 2012-06-25
UserId - StockId2 - 11% - 2012-06-26
UserId - StockId1 - 20% - 2012-06-30
So you see that stock 1 went down at 30th. Now if you want to know the StockId1 percentage at the 28th you just select:
SELECT *
FROM stocks
WHERE datecolumn<=DATE(2012-06-28)
ORDER BY datecolumn DESC LIMIT 0,1
If it gives nothing back you did not have it, otherwise you get the last position back.
BTW. if you need for example a graph of stock 1 you could left join against a table full of dates. Then you can fill in the gaps easily.
Found this post here for example:
UPDATE mytable
SET number = (#n := COALESCE(number, #n))
ORDER BY date;
SQL QUERY replace NULL value in a row with a value from the previous known value

Resources