I am more familiar with Excel but want all my information in Notion since it’s so powerful.
I’m trying to track weeks of results for multiple individuals on a team. I would like to see a team view but also see the individual performance on a weekly basis.
I want to have the information from each team member rollup to the database that has the sum of all teams for that particular week.
Team page
Teammate 1
Teammate 2
Related
I work for a research organization in India and have recently taken up with a program extending immunizations among poor rural communities. They're a fairly large organization but don't really have any IT infrastructure. Data reports on vaccine coverage, logistical questions, meeting attendance etc. come from hundreds of villages, go from pen-and-paper through several iterations of data entry and compilation, finally arriving each month at the central office as HUNDREDS of messy Excel sheets. The organization generally needs nothing more than simple totals and proportions from a large series of indicators, but doctors and high-level professionals are left spending days summing the sheets by hand, introducing lots of error and generally wasting a ton of time. I threw in some formulas and at least automated the process within single sheets, but the compilation and cross-referencing is still an issue.
There's not much to be done at the point of data collection...obviously it would be great to implement some system at the point of entry, but that would involve training hundreds of officials and local health workers; not practical at the moment.
My question: what can be done with the stack of excel sheets every month so we can analyze individually and also holistically? Is there any type of management app. or simple database we can build to upload and compile the data for easy analysis in R or even (gasp) excel? What kind of tools could I implement and then pass on to some relative technophobes? Can we house it all online?
I'm by no means a programmer but I'm an epidemiologist/stats analyst proficient in R and Google products and the general tools of a not-so-tech-averse millenial. I'd be into using this as an opportunity for learning some mySQL or similar, but need some guidance. Any ideas are appreciated...there has to be a better way!
step by step approach would be, first store the raw data from excel sheets and papers in structured database. Once data is maintained in DB you will have many tools to manipulate that data later.
Use any database like MySQL to store excel sheet ; Excel sheets or CSV files can be exported to database directly.
Later with simple database operations you can you can manipulate the data; you can use reports/web application/etc. to display and manage data.
and keep the good Work!
I have some trouble to design my data warehouse. Here's the context :
Financial people register our deals and report a financial snapshot every month. When they register new deals, they also indicates some information like which equipment is sold, at which customer, etc. (our dimensions).
Project managers add additionnal data to these deals with milestones information (startup project date, customer acceptance date, etc.), also on a monthly basis.
Finance will only use finance information, Project Manager could use both type of information.
Based on this information, I have many possible scenarios, which is the best ?
1st scenario : star schema
In this scenario, I have two separate tables for Finance and Project management. But the thing is that I will have to duplicate reference to dimensions (equipment, customer, etc.) as it is Finance that declare deals and that information have to stay consistant for a same deal.
First Scenario Schema
2nd scenario : one common table
As we have the same granularity (both are monthly snapshot), we could merge Finance and Project management information in a single table and proposes two views to the users. But I fear that it will become a mess (different enterprise function in a single table...).
3nd scenario : snowflake schema
We also could add a "Deal" table, containing all references to other dimensions (customer, equipment, etc.).
Third Scenario Schema
Thanks in advice for any usefull advice !
I am implementing a simple application for expenses reporting.The application will use GAE. In my application I have several entities (Classes) like Year, Month,Day, Expenses, Account and so on. The picture is as follow: a user can create an Account, then start to declare expenses with a simple form. The expenses are stored in GAE Datasotre. Every Year has Months, every month has Days and every day has a declared Expenses.the problem is that i don't know how to arrange theses entities in the non-relational database of GAE. I read several tutorial and articles from Google Developers website, but still don't understand the concept of Parent/child relationship and groups of entities. Anyone can help with some tutorial,video, articles or books on how to design the relationship and store your entities in a Non-relations database like GAE Data store. thanks in advance. I forget to mention that I would like to use GAE low-level data store.
If you are using java, I would suggest using objectify. It's just so much easier than JPA, for me at least.
You are paying by the read and write, so if for instance you can fit all of the data for a month in 1mb, then I would not have a separate entity for day. Anyway, I don't understand your requirements like why year has to be an entity and can just be a property that you filter by. I would actually think about just having a Day entity with Year, and Month properties to filter by.
http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Relationships
In MongoDb you would have "embeded" documents. I dont know if GAE is as evolved as MongoDB - I suspect not. Perhaps you should look at another better documented NOSQL database if you are having problems with documetation at this stage. I'd have a look at the MongoDB site anyway, so if you ahve a background in SQL, you can see the mapping in terminology between the two cultures. Of course any NOSQL database is inherantly non transactional, so when the app develops to track expense payments, there may be some insuperable issues later.
I'm running a classifieds website that has ads and comments on it. As the traffic has grown to a considerable amount and the number of ads in the system have reached over 1.5 million out of which nearly 250K are active ads.
Now the problem is that the system has been designed to be very dynamic in terms of the category of the ads and the properties each kind of ad can have based on it category or sub category, therefore to display an ad I have to join nearly 4 to 5 tables.
To solve this issue I have created a flat table (conceptually what I call a publishing table) and populate that table with an SQL Job every 3 to 4 minutes. Now for web requests I query that table to show ad listings or details.
I also have implemented a data cache of around 1 minute for each unique url combination for ad listings and for each ad detail.
I do the same thing for comments on ads (i.e. cache the comments and as the comments are hierarchical, I have used a flat table publishing model for them also, again populated with an SQL Job)
My questions are as follows:
Is the publishing model with a backgroung sql job a good design approach?
What approach would you take or people take for scenarios like this?
How does a website like facebook show comments realtime with millions of users, keeping sure that they do not lose any comments data by only keeping it in the cache and doing batch updates ?
Starting at the end:
3.How does a website like facebook show comments realtime with millions of users, keeping sure
that they do not lose any comments data by only keeping it in the cache and doing batch updates ?
Two things:
Smarter programming than you. They can put a larget etam on solving this problem for months.
Ignorance. They really dont care too muich about a cache being a little outdated. Noone will really realize.
Hardware ;) More and more powerful servers than yours.
That said, your apoproach sounds sensible.
We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.
The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:
as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
this is slow and non-"agile"
Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.
You will want to google Star Schema. The basic idea is to model a special data warehouse / OLAP instance of your existing OLTP system in a way that is optimized to provided the type of aggregations you describe. This instance will be comprised of facts and dimensions.
In the example below, sales 'facts' are modeled to provide analytics based on customer, store, product, time and other 'dimensions'.
You will find Microsoft's Adventure Works sample databases instructive, in that they provide both the OLTP and OLAP schemas along with representative data.
There are special db's for analytics like Greenplum, Aster data, Vertica, Netezza, Infobright and others. You can read about those db's on this site: http://www.dbms2.com/
The canonical handbook on Star-Schema style data warehouses is Raplh Kimball's "The Data Warehouse Toolkit" (there's also the "Clickstream Data Warehousing" in the same series, but this is from 2002 I think, and somewhat dated, I think that if there's a new version of the Kimball book it might serve you better. If you google for "web analytics data warehouse" there are a bunch of sample schema available to download & study.
On the other hand, a lot of the no-sql that happens in real life is based around mining clickstream data, so it might be worth see what the Hadoop/Cassandra/[latest-cool-thing] community has in the way of case studies to see if your use case matches well with what they can do.