Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Not sure if this is the proper place to post this question, but I've seen questions regarding ISBN databases, so I thought it would be appropriate.
In my website, I intend allow my users to choose between all the US college/universities (community or 4 year institutes). I would then store their selection in a database.
At first I thought about allowing them to input the name themselves, but saw some issues with that. I tried to look for a database of some sort, but all I found were search engines to find a specific university.
I was hoping to find a database can I export to my own database (SQL Server) and have users search my own database.
Has anyone come across this issue and found a reasonable solution?
This is an old question, but I wanted to post the answer for those who find this page.
This should do the trick: http://ope.ed.gov/accreditation/GetDownloadFile.aspx
It's a csv and xls file of all the accredited universities in the US. It's about 22,800 rows and narrowing it uniquely by name brings it down to about 9,000.
Enjoy!
This question was posted in 2011, but luckily data is getting easier to come by!
Department of Education now has an API (also CSV) available to get a variety of data about universities as well as public elementary and high schools.
Their Directory Listing CSV under 'Colleges and Universities' contains about 7,700 rows which matches up pretty well with NCES data (from 2012) putting the number of post-secondary institutions in the US around 7,000.
Disclaimer before you go write academic research with that database: Considering that NCES number is a couple years out of date, it seems reasonable to assume this Dept of Ed listing is reasonably accurate though I havent tested it rigorously
You can combine these two by having a search box with auto complete, if the input query does not match, indicate if they want to add this into database. You can create a table in database to hold all these contributions and they won't get added to the list before you or someone approve it.
googling for 'list of us universities' gives me a lot of hits.
you could have a textbox which allows them to enter the name themselves but offers autocomplete functionality which aids them in inputing a string which is an exact match to one in your database (sort of like how facebook autocompletes the friend search on the top right)
if the user ignores this and specifies an unknown string, you could either add this new string to your database or refuse, saying that they should ask the admin to add it for them and then try again
The most complete resource is the IPEDS, you need to click on download survey, and download the year data you like. I have called them to see if they have an API, but no luck there, so it's all in excel format. bummer....
http://nces.ed.gov/ipeds/datacenter/Default.aspx
Here is the free database of the major worldwide universities:
https://github.com/turalus/openDB
It's 9498 Universities from all over the world.
Their names are translated into 3 languages: English, Russian, Azerbaijani.
3072 of them have logos.
organized by countries
you get complete list from
http://www.webometrics.info/
complete world univerity names and ranks. now just scrap them .
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
a colleague of mine uses Excel to merge and analyse datasets (~10k lines).
Her spreadsheets are mazes of vlookup and nested if formulas.
How can I convince her to take a look at databases?
What would be a good way to start? I'm an sqlite fan, but wonder whether the entry threshold to Access is lower?
Are there any books that you'd recommend to get started? I checked this SO question What's a good book for introduction to databases for web developers - any additions to the list there?
Thanks,
Simone
re: How can I convince her to take a look at databases?
show her why your way is better.
redo what she did in Excel with your preferred tool and the same input data and see if you can find differences in the output.
Also, after both systems are set up, run them side-by-side for awhile noting performance and maintenance differences. If she agrees your way is better, she might decide to use it.
Not a direct answer to your question but as a developer who has done extensive work on data analysis in Excel a few observations.
If the primary goal is data analysis then using Excel might be good enough.
Specially if the different data sets (you mentioned merging) are provided as csv files - as and when required - going through the 'hassle' of first importing data into a sql database and then running queries to extract data for the analysis step might be too much.
Excel gives you the flexibility of playing around with your data, very easily trying different things, charting, pivot tables etc. If the reports that your friend needs are more or less static with only the data varying, then maybe a simple Access/SQL database with a small application on top would be a better solution. But then again, if this is the case, your friend probably has an Excel sheet with all the relevant formulas where only the data needs to be plugged in.
For most of my data-analysis in Excel the only real thing I have missed is the ability to gather data using foreign keys. Once you have that covered with vlookup, the rest of the analysis is usually quicker/easier in Excel.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
When you are watching for news of particular Wikipedia article via its RSS channel,
its annoying without filtering the information, because most of the edits is spam,
vandalism, minor edits etc.
My approach is to create filters. I decided to remove all edits that don't contain a nickname of the contributor but are identified only by the IP address of the contributor, because most of such edits is spam (though there are some good contributions). This was easy to do with regular expressions.
I also removed edits that contained vulgarisms and other typical spam keywords.
Do you know some better approach utilizing algorithms or heuristics with regular expressions, AI, text-processing techniques etc.? The approach should be able to detect bad posts (minor edits or vandalisms) and should be able to incrementally learn what is good/bad contribution and update its database.
thank you
There are many different approaches you can take here, but traditionally spam filters with incremental learning have been implemented using Naive bayesian classifiers. Personally, I prefer the even easier to implement Winnow2 algorithm (details can be found in this paper).
First you need to extract features from the text you want to classify. Unfortunately the Wikipedia RSS feeds don't seem to be particularly machine readable, so you probably need to do some preprocessing. Alternatively you could directly use the Mediawiki API or see if one of the bot frameworks linked at the bottom of this page is of help to you.
Ideally you would end up with a list of words that were added, words that were removed, various statistics you can compute from that, and the metadata of the edit. I imagine the list of features would look something like this:
editComment: wordA (wordA appears in edit comment)
-wordB (wordB removed from article)
+wordC (wordC added to article)
numWordsAdded: 17
numWordsRemoved: 22
editIsMinor: Yes
editByAnIP: No
editorUsername: Foo
etc.
Anything you think might be helpful in distinguishing good from bad edits.
Once you have extracted your features, it is fairly simple to use them to train the Winnow/Bayesian classifier.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have several SQL Server 2005 databases ranging from 20 – 600 tables in an application and no documentation. I am looking for a database diagramming tool that is smart enough to pick tables that seem to be related to one entity (e.g., tables related to Patient, tables related to Orders) or one functionality (e.g., Patient Management, Order Management) and show them separately instead of drawing the entire database.
In the past, I have seen tables related to one piece of functionality represented in one color in the ER diagrams. In a well designed database, perhaps there will be multiple schemas that group tables related to one functionality together. But as all these tables are in one schema, and I want a tool that is smart enough to perhaps suggest which tables should go together under one schema. It won’t be perfect but perhaps it is intelligent enough to examine which tables should go together (for example based on relationships between them or based on which tables seem to be accessed together in the stored procs).
The bottom line is that I want to understand the data-model as quickly as possible. A tool called Schema Spy ( http://schemaspy.sourceforge.net/ ) seem to be headed in the right direction, but I was wondering if anyone knew better/more comprehensive tools.
Thanks.
Have you tried Visio at all? While it does not satisfy everything you asked for it can reverse engineer a database and make very appealing diagrams with a little work.
I have never used it to understand an existing database, but I have used it to explain databases I have created.
You could have a look at wsSqlSrvDoc. It's a nice little tool that works with Sql Server extended properties and creates a MS Word document.
The print-out of all column properties (with foreign key relations) works out of the box. For further descriptions on each field you have to set up extended properties of those columns in Sql Server Management Studio.
The downside however is that it's not free (but quite afordable). And if you just need to create a documentation for a "not work in progress" DB that's more or less finished than it would be enough to use the free trial i'll guess.
This question is related to an older question, Link:A good database modeling tool?
From the answer to this question, e.g. fabFORCE.net dbDesigner might be what you are looking for.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
for a program I am writing I would need a dictionary between Spanish and English words. I googled a while, but I could not find any database freely available. Does anybody know where or how to get such a database (preferably a simple CSV or XML file)?
So far my best idea to create such a dictionary is to create a little program that looks up an English word on Wikipedia, and uses the language links to extract the correct translation. But I don't want to want to make a million requests to Wikipedia just to generate this database...
I don't need anything fancy, just a mapping from one word to one or possibly multiple translations for this word. Just like a regular dictionary.
Ask around on the Omega Wiki, formerly known as the Ultimate Wiktionary or Wiktionary Z. They collect translations from all languages into all languages, and their data is available in a relational database.
Do you need to translate on the fly at runtime, or is this a one-time translation of labels and messages for a UI?
I'd say that runtime translation will be remarkably difficult, because you'll need more than a dictionary of words. Natural language processing is difficult in any language. Most languages need to know something about context to translate smoothly.
If it's a one-time translation of UI elements, I've had good luck using Google Translate to go from Japanese to English.
To answer your question, I don't have a database like that, sorry.
The problem with natural languages is that they are very context dependent, so the same word in English can mean many things in French. Take the English verb 'to know'. This can be translated into French as either 'savoir' (to know a fact), or connaitre (to know a person, or a town).
I'd be very interested to know if there exists such a database, but I doubt if it exists.
Sites like http://www.reverso.net hedge their bets by showing both results.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am in the process of writing an application which, among other functionality, generates MediaWiki documentation of an MSSQL database (objects, tables, table data).
My question is which document formats you prefer, or are required to produce. I have too many ideas to follow, so your answers should set my priorities ;)
(I know there are other documentation-related questions on SO, but they mostly deal with how to generate documentation (I know how to), and do not ask for specific doc types or platforms)
Edit:
Thanks for the comments. Actually I have table relations already, since I parse foreign keys. However full cross-reference may be a bit trickier ;)
However the question was meant to ask for the document types, such as Word, PDF, ODF, whatever. What are your professional requirements or preferences?
Update:
Overview of generated documentation
It sounds like you have already decided on a document format, which is HTML based on MediaWiki markup.
Also you should generate Entity-Relationship Diagrams which are useful additions to database documentation (though ERD's don't tell the whole story either).
Do you mean document organization, i.e. what headings and content should be included in each page?
Here are some suggestions:
Table Structure
Column names, data types, constraints
Meaning and usage of each column
Extra logical constraints in triggers and application code
Indexes defined
Relationships to other tables
Tables dependent on this one
Tables this one depends on
Notes on special or implicit relationships, that have no enforcement through database constraints
Usage of table
Usage in stored procedures
Usage in application code
Usage in views
Who has read and/or write access; SQL privileges of each user or role
There are other questions at StackOverflow that are very close to this one.
"How to document a database" is a very similar question to yours, since it's specifically about wiki documentation solutions.
"What are the best ways to understand an unfamiliar database" may give you some good tips, as you are creating documentation that would help someone in that situation.
"How do you document your database structure?" is related but not as closely, because it's about putting documentation into the metadata itself.
You might want to have a look at what the commerical vendors do regarding this. As Bill said, you certainly need an ER diagram. Commercial products to look at could include Embarcadero ER/Studio, Red-Gate SQL Doc, Power Designer and others.