need sample data for e-commerce class project - dataset

In my CS course project this fall, we have to build a little eCommerce app (like Amazon, eBay, etc). We are free to build any type of eCommerce/store app. Since I don't have a preference for what app to build, perhaps it may be easier to decide based on freely available sample data for the store. So is there some freely available dataset available that represents a set of products, like groceries, movies, books, cars, apps, electronics, weapons, library, etc? It doesn't have to be real but as long as it can save me a few hours of entering data, it will be worthwhile. An open data format for the dataset would be useful, a MySQL database would be great.
Perhaps I should use the Northwind database from MSSQL?
Is there a "Northwind" type database available for MySQL?
I haven't looked at all the reference in this post but it looks promising: Where can I find sample databases with common formatted data that I can use in multiple database engines?
Any suggestions eCommerce sample datasets?

At this link you can find some e-commerce datasets in Comma-Separated Values format, namely some snapshots of Amazon, Google Products, ABT and Buy.
http://dbs.uni-leipzig.de/en/research/projects/object_matching/fever/benchmark_datasets_for_entity_resolution

You can try the nopCommerce sample data. Download from http://nopcommerce.codeplex.com. During installation, tick the "create sample data" box. You can then use the data within your own application. The schema is pretty easy to understand.

Related

Simple centralized system to store and share information at an office

I work at an office with some colleagues generating and consuming structured data which can be normally stored in a database. For instance:
- data from several countries: capital, population, currency, ...
- forecasts of the evolution of the population in each country: each year we generate one different time series
We store these data in dozens of Excel files (which is the last version?, where are they stored?, are they in a shared directory?), and we produce lots of document from these data (power point files, other Excel files to make calculations, ...).
I know how to install a mySQL server on Linux, and I could build a web-app to generate and store data, and I could build an API to consume the data. But I wondered if there was any other smarter solution to implement a simple centralized system to store and share information at an office.
Thank you very much.
It may be better to implement a cloud service here instead of working things from scratch - just to save the time and effort. Here are two cloud service solutions with some pros and cons of their features. If anything strikes to be useful, I would recommend looking deeper into them.
Cloud Service 1
If storing and sharing of files is the key point here, using an online storage system like box.com would be a good solution. I personally like box.com better than Dropbox since it seems to have better Admin capabilities when working in teams (I may just be baiased here).
Pros:
there are version histories for uploaded files, and files can be locked so they cannot be downloaded
there are access stats (logs) for each file, so you know when someone viewed or downloaded files
there's an area for Box users to leave comments for each file that is uploaded
Excel/Word/Powerpoint files can be previewed in the browser before
downloading them (and other files as well - personally found that
preview of vector files being very useful)
directories and files are immediately accessable via mobile after uploading
shared links can be generated for each file or directory for users without Box accounts to view and download
Cons:
Users will need a Box account to upload files (even for the shared links)
Users may be more used to the Dropbox UI and may find Box to be have a different UX than expected (althoug the UI is not hard to master at all)
Cloud Service 2
If finding an SQL alternative is the key point here, using an online database platform such as kintone would be a good solution. kintone allows you to build customizable online tables (called "Apps" in kintone) using drag and drop.
Pros:
live graphs can be generated on kintone from the stored data
database tables can be created and updated really easily with just the GUI
you can define table columns (or "Fields") to store attachment files
each row (or "Record") has an area for users to leave comments
each Record has a history feature, so you know who edited what contents and when. Nothing is saved locally on your computer, so the latest data is always online (no conflicts occur)
kintone also has internal forums (or "Spaces") that can be used as an alternative for internal emails.
Apps and Spaces are instantly accessable via mobile
kintone has open REST APIs and JavaScript customization capabilities for any further UI changes or connecting with other sources
Cons:
users need a kintone account to view data or to add data into apps, although there are 3rd party solutions (at a reasonable price) that allow you to do that
it may not be as intuitive as storing data in excel spreadsheets (but that's mainly because everyone's used to excel)
Further questions about cloud services may belong better in the Software Recommendations Community https://softwarerecs.stackexchange.com/
I'm personally good with the kintone APIs, so if you have any further questions related to API (capabilities, limits, possibilities etc), please go ahead to post them here in stackoverflow

Dynamic Data Extract Tools

I've been searching around for a few weeks now for a tool that either is fully built or a direction of something I could build for dynamically extracting data via a web interface. Basically, what I'm looking for is a way to give users a list of all available data objects from our database and then let them pick ones from the list they'd like to view and set parameters then export the results to an excel file.
Right now we're doing it purely with SQL statements but we have hundreds of objects so as you might imagine, those statements are really complex and prone to errors. It would be great if there was a tool available to do this or if someone had an idea of an easy way to organize this. Any help would be greatly appreciated.
We've looked at BI tools like QlikView and Tableau but that is probably overkill for what we're trying to do. The open-source BI tools we've looked at seemed really primitive in their functionality. The other thing we looked at was MSAS (our DB is SQL Server) but I'd prefer something that was more database-agnostic and lived on a web server instead of on the database.
I think what you are describing is a typical BI reporting tool. I don't know what open-source BI tools you have been looking at but there are open source solutions which aren't "primitive" at all. The two main open-source reporting libraries are JasperReports and BIRT. You can design report templates within a graphical interface (NetBeans plugin called iReport for Jasper, Eclipse plugin for BIRT). A simple report template is basically an xml file which contains a parameterized SQL query and describes how to display the query results.
End-user typically connect to a web application (Java EE app which uses the reporting library) which executes the report templates : it asks the user to input parameters in a graphical way such as drop-down lists and checkboxes, and then retrieves the SQL query results from the database, and displays them according to the template (tables, charts, etc.). These results can then be exported in many formats including xls.
JasperReports developers provide a free open-source webapp designed to run Jasper reports, called JasperReports Server. Other open-source projects let you execute reports designed either with BIRT or Jasper, for instance ReportsServer which I haven't tested.
At my company we use SpagoBI, which is a fully-fledged free and open source Business Intelligence suite. This means that it has all the features of a commercial BI suite. The most useful is probably the ad-hoc query editor, which lets users with little technical knowledge design simple queries by dragging and dropping fields, and it performs the underlying table joins for them. It then lets users design simple reports such as pie charts or line charts from the data they just extracted. This sort of feature is part of the commercial editions of JasperReports Server and Actuate One (the BIRT equivalent of JasperReports Server which doesn't have a free version).
SpagoBI is a great, powerful tool and I recommend it, but it is also quite difficult to configure and to master. Maybe if your needs are only to execute pre-defined reports you had better go with one of the other solutions.
PowerPivot, Data Explorer, or Microsoft Query?
Sorry, didnt see that you wanted a web interface...
You can try get some export data functions from SQL Web Data Administrator - http://sqlwebadmin.codeplex.com/
Or you can install the web tool but restrict access for its web-page other then export data pages.
Cognos BI (specifically, the web-based Query Studio) fits this tab perfectly and is a great tool to deploy to non-technical web users.
It does require a pretty robust setup and is not cheap but it is an enterprise-class product. I've only worked with the full-scale deployment but they also have an Express product for small/midsize companies.
If you could clarify number of users, database size, expected query volume, and budget, we could refine the toolset further...

Which database does Youtube use at the moment?

I hope anyone can help me out in this topic, even if it's not a specific programming question.
I'm writing a bachelor thesis, where I compare MySQL to MongoDB and I want to write something about Youtube, as the platform has to handle many requests with heavy dataload.
The only good resource which I found was this video: Seattle Conference on Scalability: YouTube Scalability
As the conference was in 2007, I can imagine there were some updates regarding to the database.
The last information that I have from this talk is that the thumbnails are stored in a BigTable database and the metadata in MySQL. Are there any changes since then?
Where are the videos stored? Is there an entry in the MySQL table, which refers to the stored video?
Thanks in advance for the answer!
According to this, youtube still uses mysql: http://code.google.com/p/vitess/wiki/ProjectGoals
I am not sure of how things are at youtube but I am in process of developing a similar application for our client. So what we are doing is we are making the use of best of both worlds i.e SQL and NoSQL..
We store the videos on disk and store the path to these videos in MySQL db table. Then we have a separate table which holds the genre and video mapping i.e which video belongs to which particular genre.
Today with vast of pool of user data we are in position to leverage upon these data like we had never been before, so you see things are now way different then 2007 and with the popularity and dependency of people on internet when it comes to sites like you tube we have vast set of unstructured data which if used properly can give you great results. So in our project we store the site admin and reporting stuff like user db, video locations and genre mapping etc in MySQL and store the unstructured data about user interaction in NoSQL database. We then use the NoSQL data to do all the analytics and give appropriate results to the user.
They are using mysql with Bigdata.
The user information such has who uploaded the file,file information all will be stored in mysql and data will be stored in Bigdata.
I think they are using database that can use FileTable

Data input to a website

I'm new to website design and am building/learning how to put together a data driven website that will help users with calorie/ vegetarian types of queries. My question is for big sites like DailyBurn, SparkPeople do they rent a database or build their own? I know users data is stored on their sites, so do they have separate db's for user input and calorie output? If someone is building their site from scratch is it better and cheaper to just create their own db's from scratch or pay for an existing one?
The other negative is a site like CalorieKing requires me to show their name on any queries I think even for the paid service which I do not want to do.
Thanks
H
They're probably going to be separate tables of the same database.
I'm not exactly sure what you mean by creating your own database, but with the advent of AWS they are dirt cheap.

need sample data for e-commerce master research

In my MSC research, I have to build an eCommerce app (like Amazon, eBay, etc) but a location based one . I need a freely available sample data for the store. So is there some freely available dataset available that represents a set of products, like groceries, movies, books, cars, apps, electronics, weapons, library, etc? .I need its size to be adequate for analysis. I need a database with at least 200 customers , 1000 products of different categories and 1000 orders. The customers data have to include information such as age, sex, location, education.
You could try the Microsoft Contoso BI Demo Dataset. I have not used it, so I do not know if it will meet all of your requirements. However, given that it is used to demonstrate BI functionality across all of MS' BI products, one would hope it was fairly comprehensive.

Resources