Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am relatively new in the field of Data Mining. I am currently doing Some Data preprocessing algorithms such as PCA and min max Normalization. Our professor said we could download the data sets available over the web. But at initial level I want a simple data set with relatively small number of attributes for my algorithm, and would then switch to various complex data sets.
Can anyone provide a link for simple data sets which you must have used in your data mining algorithms? e.g. something pertaining to marks of students, age, height etc or employee data of a company. Any assistance would be greatly appreciated.
Infochimps.com
Researchpipeline.com
More and many more links here: Some Datasets Available on the Web
I used stackoverflow's data for my data mining class.
I am not sure this is going to help or not, but I gathered some websites that provided useful Data when I was working with recommender systems.
Here it is
http://girlincomputerscience.blogspot.com.br/2010/12/datasets.html
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I need to design some data structures for something that's roughly a cross between a database and a file system. In fact there's a large overlap in the design of file systems and databases. I've done some work on both, not very in-depth, but still, I have some idea about the topics. However I need more information and I am wondering if there's a good overview of design concepts out there. Things like overviews of algorithms, data structures and the like. The advantages of different types of trees or hash tables for key lookup. Perhaps some information about data ordering an alignment. In short, the experiences of other people in implementing filesystems and databases that would help the next person avoid the same mistakes.
Gray wrote a book titled "Transaction Processing: Concepts and Techniques" - http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902 - which covers a great deal of what you would need to build your own database.
One starting point for file systems would be http://www.amazon.com/The-Design-UNIX-Operating-System/dp/0132017997 - but I suspect you would have to chase down individual papers or Linux source code to keep up with the variety of different file systems created since then.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Are there any databases that focus more on insanely fast read/querying performance? I am looking for a good nosql document database that I don't plan for any significance writes (updating the database is only done by me on a weekly schedule). The type of query I will be doing are keyword searches that will search multiple string fields and interval searching that will look for elements within/overlap an interval.
I was looking at Redis initially, but I needed something a bit more extensive than key/value to store my data. MongoDB looks like a good choice?
There are many possible solutions for your problem, the right solution depends mostly on the scale and the actual use case.
Redis does quite a bit more than simple key/value storage, but I doubt that it's what you need right now.
For mostly read only storage and these type of searches I would recommend taking a look at Elasticsearch and/or Solr, should do what you need and more.
Basically... fast to read, slow to write means lots of denormalisation. You can do that yourself, you can (with some apps) let the database take care of it. But it's always a trade off.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Today many companies are providing analytics based on social media data. In order to do that every company has to get the data from different social networks like twitter, facebook, etc. It would be nice if we could go to one single data provider that would provide us with data of all social networking sites. That way every company doesn't have to build their own data infrastructure and can concentrate on analytics only and not on data fetching. http://www.gnip.com is one such data provider. Does anyone know of any more such data providers?
I can actually think of quite a few data providers like that. Gnip is obviously the big kid in the room...but a few others are:
http://www.datasift.com
http://www.collectiveintellect.com
http://www.spinn3r.com/ (Not as much of an all-encompassing aggregator, but should still work for the purposes you describe)
I'm sure there are others out there, but Gnip & these three (Datasift & Gnip in particular) seem like the biggest data providers of this sort.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am new to database schema design and I want to learn more about how a well-designed database scheme is implemented in the real world?
Is there any places to find those schemes? Or is there any book focused on explanation over examples.
DatabaseAnswers.org (unfortunately now defunct) but well-preserved in the Wayback Machine is a great source of example database schemas.
I can also recommend Beginning Database Design, published by Apress. I own this book and can confirm that it is of high quality. The book looks at a number of real world scenarios and explains the impact a certain design decision could have on the way the database works and the quality of the data and its output.
Finally I would advise building some small databases (E.G. contact management, Task list etc). Start by specifying some basic requirements and create some tables and queries. You WILL make some mistakes which is the best way of learning.
Here is a nice library of schemas to browse through.
http://www.databaseanswers.org/data_models
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to create two dropdowns for an app. One shows all countries the the world, the other is filled with all cities of a selected country in the first dropdown. Is there any web service or database that provides such information?
Given that you found by yourself the answer to your question, I'll give the answer for what you ask in the comment:
Maybe Boxer Text Editor will do the trick:
Large file capacity
edit files up to 2 GB in size
no theoretical limit on the number of open files
line lengths to 32K characters
total editing capacity is limited only by the operating system's virtual
memory capabilities
Desmond
You can get a list of all the countries and all cities per country from a free database from MaxMind.com that is only 33Mb.
http://www.maxmind.com/app/worldcities
Andrew
Part of the "OpenGeoCode.Org" Team
Geonames has several webservices which basically give you programmatic access to that 800m db.
I used the one at http://www.zuzemo.com
It has just about everything you need, countries, regions and cities