Election 2008 data files - how to know which candidate received the financial contribution? - file-format

I was trying to programmatically go through presidential campaign contributions to see which web 2.0 people contributed to which candidates. You can get the data file indiv08.zip on the site http://www.fec.gov/finance/disclosure/ftpdet.shtml#a2007_2008
I can parse out who contributed and how much they contributed, but I cannot figure out who the contribution went to. There seems to be some ID, but it does not match the candidate IDs in any other file I can find. If you can help with this, I think I could make a pretty nice page showing who contributed to which candidate.
update -- I wonder if I just misunderstand this contribution system. Perhaps candidates cannot receive contributions at all, only committees? For example I see "C00431445OBAMA FOR AMERICA" received a lot of contributions. That makes it a bit more complicated then to associate those committees to candidates. Basically I want to know who supported Obama and who supported McCain.

On page 3 of the tutorial that is linked at the top of the page you liked to contains the column names including "Candidate Identification".

Related

Query Table Based on Variable Number of Selection Criteria

I need to query a table in SQL Server based on a number of criteria, any of which could be null. Criteria are:
SaleId
City
ZipCode
County
JudgementAmountMin
JudgementAmountMax
AssessedValueMin
AssessedValueMax
Saledatestart
Saledateend
All of these can be used to filter a selection or just come of them or none of them. I came up with a solution, which broke, and after posting a question asking for help, the responses I received all told me my solution was crap. OK, great, my solution was crap, even though I used a book on stored procedures to guide me and copied most of my solution from the example printed in the text. So how do I build a solution that isn't crap?
Keep in mind that I am should be considered a fairly new SQL Server practitioner and that I would need more than a two line terse response referring to some general process that can be narrowed down to provide a solution by those who already experienced. I am willing to learn and work at finding a solution, but ask that the response be more newbie friendly than being told that my solution is not any good and questioning why anyone would do it that way. Maybe because they followed an example in a book and didn't know any better. Examples are very helpful!
Remember everyone has to start somewhere. I am deaf, so going to classes is not an option that has worked for me. I really am trying, I just need some help.
Regards
Tom

sql, product inventory module

I have been tasked with creating a product inventory module. After reading all the posts I can find on Stack Overflow, I have decided the best way is to not keep a separate, running ‘balance’, but to create one on the fly. I have attached a representation of the tables involved.
Actually, it seems like I don't have enough reputation points to include a picture, so here is a link to a dropbox file:
So I have two questions, which are somewhat related, so it seem like I should include them in the same question posting, though I am not a frequent poster and a sql noob. So please excuse me if I am displaying my ignorance with posting or sql.
First, does this look correct (I named all the columns as non-opaque as possible)? I have to create reports that show the current inventory balance for all the products and for products individually as well as a ‘Transaction Register’ with running balance.
Second, provided the first answer is yes, is this a good candidate for creating a view?
Complex question. Difficult to answer without understanding the full scope of the project. One point - I see there is no Current On-hand table. I agree that the running balance at any point in time is best to use a calculated table. It is however common practice to keep a current on-hand table. This gives you the on hand inventory and values with-out having to sum up the transaction. This is the approach in Microsoft Axapta, and other products I have worked with.

Making solr to understand English

I'm trying to setup solr that should understand English. For example I've indexed our company website (www.biginfolabs.com) or it could be any other website or our own data.
If i put some English like queries i should get the one word answer just what Google does;queries are:
Where is India located.
who is the father of Obama.
Workaround:
Integrated UIMA,Mahout with solr(person name,city name extraction is done).
I read the book called "Taming Text" and implemented https://github.com/tamingtext/book. But Did not get what i want.
Can anyone please tell how to move further. It can be anything our team is ready to do it.
This task is called Named Entity Recognition. You can look up this tutorial to see how they use Solr for extractic atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. and then learning a model to answer queries.
Also have a look at Stanford NLP for more ideas on algorithms that you can use.

Variations in spelling of first name

As part of a contact management system I have a large database of names. People frequently edit this and as a result we run into issues of the same person existing in different forms (John Smith and Jonathan Smith). I looked into word similarity but it's easy to think of name variations which are not similar at all (Richard vs Dick). I was wondering if there was a list of common English first name variations that I could use to detect and correct such errors.
I would crawl all wikipedia pages (there is an available dump of wikipedia data) on people names, e.g., http://en.wikipedia.org/wiki/Teresa (from http://en.wikipedia.org/wiki/Category:English_given_names), and create an index that you can use to suggest people correct forms (you will rank them by the number of first name variants in your database). Unfortunately I do not know. such a database.
This thread points to a list of nickname/first name maps from the census:
http://deron.meranda.us/data/nicknames.txt

How does Wikipedia avoid duplicate entries?

How can websites as big as Wikipedia sort duplicated entries out?
I need to know the exact procedure from the moment that user creates the duplicate entry and so on. If you don't know it but you know a method please send it.
----update----
Suppose there is wikipedia.com/horse and somebody afterward creates wikipedia.com/the_horse this is a duplicate entry! It should be deleted or may be redirected to the original page.
It's a manual process
Basically, sites such as wikipedia and also stackoverflow rely on their users/editors not to make duplicates or to merge/remove them when they have been created by accident. There are various features that make this process easier and more reliable:
Establish good naming conventions ("the horse" is not a well-accepted name, one would naturally choose "horse") so that editors will naturally give the same name to the same subject.
Make it easy for editors to find similar articles.
Make it easy to flag articles as duplicates or delete them.
Make sensible restrictions so that vandals can't mis-use these features to remove genuine content from your site.
Having said this, you still find a lot of duplicate information on wikipedia --- but the editors are cleaning this up as quickly as it is being added.
It's all about community (update)
Community sites (like wikipedia or stackoverflow) over time develop their procedures over time. Take a look at Wikipedia:about Stackoverflow:FAQ or meta.stackoverflow. You can spend weeks reading about all the little (but important) details of how a community together builds a site together and how they deal with the problems that arise. Much of this is about rules for your contributors --- but as you develop your rules, many of their details will be put into the code of your site.
As a general rule, I would strongly suggest to start a site with a simple system and a small community of contributors that agree on a common goal and are interested in reading the content of your site, like to contribute, are willing to compromise and to correct problems manually. At this stage it is much more important to have an "identity" of your community and mutual help than to have many visitors or contributors. You will have to spend much time and care to deal with problems as they arise and delegate responsibility to your members. Once the site has a basis and a commonly agreed direction, you can slowly grow your community. If you do it right, you will gain enough supporters to share the additional work amongst the new members. If you don't care enough, spammers or trolls will take over your site.
Note that Wikipedia grew slowly over many years to its current size. The secret is not "get big" but "keep growing healthily".
Having said that, stackoverflow seems to have grown at a faster rate than wikipedia. You may want to consider the different trade off decisions that were made here: stackoverflow is much more restricted in allowing one user to change the contribution of another user. Bad information is often simply pushed down to the bottom of a page (low ranking). Hence, it will not produce articles like wikipedia. But it's easier to keep problems out.
I can add one to Yaakov's list:
* Wikipedia makes sure that after merging the information, "The Horse" points to "Horse", so that the same wrong title can not be used a second time.
EBAGHAKI, responding to your last question in the comments above:
If you're trying to design your own system with these features, the key one is:
Make the namespace itself editable by the community that is identifying duplicates.
In MediaWiki's case, this is done with the special "#REDIRECT" command -- an article created with only "#REDIRECT [[new article title]]" on its first line is treated as a URL redirect.
The rest of the editorial system used in MediaWiki is depressingly simple -- every page is essentially treated as a block of text, with no structure, and with a single-stream revision history that any reader can add a new revision to. Nothing automatic about any of this.
When you try to create a main page, you are shown a long message encouraging you to search for the page title in various ways to see whether an existing page is already there -- many sites have similar processes. Digg is a typical example of one with an aggressive, automated search to try to convince you not to post duplicates -- you have to click through a screen listing potential duplicates and affirm that yours is different, before you are allowed to post.
I assume they have a procedure that removes extraneous words such as 'the' to create a canonical title, and if it matches an existing page not allow the entry.

Resources