Where we get dataset for doing ML project? - dataset

Where do I get dataset for ML project and my topic is side effects of a particular drug based on age and gender with 400k datas?

There are plenty of datasets online. I like data.world because of the tools and community it provides. Whatever set you use, make sure the data is ethically sourced and diverse.

Related

Need help stringing together database processes

I need some help from those with more knowledge than I posses. I am currently trying to figure out how to get real time data from a database.
I need to be able to find the company info from the most recent licensees. So the search parameter I'm using is 2016-05-10T00:00:00.000
The full string together from the API and the search parameter can be found directly at this link:
https://www.hurl.it/?method=GET&url=https%3A%2F%2Fdata.wa.gov%2Fresource%2Fv8vv-gqqs.json&headers=%7B%22X-App-Token%22%3A[%22bjp8KrRvAPtuf809u1UXnI0Z8%22]%7D&args=%7B%22licenseeffectivedate%22%3A[%222004-07-14T00%3A00%3A00.000%22]%7D
So I'm looking to retrieve the most recently added accounts in order to verify 1. the license is active 2. the license number the contractor gives matches what the website says. I would like to figure out how to automate this so that when the newest licenses are added I'll know, and they will be extracted/downloaded into excel.
If anyone can help with this I would appreciate it very much. I also have more questions about using databases if any of you are experts in the field.
Once again, thank you!
Clay
Since your goal is to get this data into Excell, have you considered using something like our OData support instead? You could structure your query in Excel PowerBI and it'd automatically refresh the data.
Another option would be to use our CSV output type with an Excel web query. I use the IMPORTDATA(...) function in Google Sheets, which is very similar.

Training Natural language Classifier in IBM watson

I want to train nlc in such way that -
If I give an input as - "Sharpies" or "Cakes" or "iPhone6" then it should result in order as intent.
But it's not working for all the products, as intent should come for all the product names, where I would need to train NLC with few of product name and it will work for all the products (dynamically).
As we have thousands of products, how can get the intent as "order" for all products instead of adding all in ".csv" (Don't want to hard code all the product names)?
Can you please help me with this to retrieve the exact intent for all dyanmical products name as input to NLC?
What you are trying to do is not what NLC is intended for.
The purpose of intent is to understand what it is the end user is trying to achieve, not what products/keywords may appear in a sentence.
For example "I want to buy an iPhone" vs "I want to unlock my iPhone". Both mention iPhone but have two very different intents. In this case with training, you can distinguish between wanting to purchase, vs wanting to unlock.
One option you can try is looking at the Alchemy API entity extraction.
Another option is to use Watson Explorer Studio. But you will need Watson explorer to get it. There is Watson Knowledge Studio coming soon, which like WEX-Studio allows you to build custom annotators. You can use these annotators with UIMA to parse your text.
So you could easily build something to understand that "I don't want to buy an iPhone" is not the same as "I want to buy an iPhone", and have it extract iPhone as a product.
There is unsupported old free version of WEX-Studio called Languageware, if you want to see if that can help. That site contains manual and videos. Here is a video which I did that gives an example of how you would use it.

Generate a series of documents based on SQL table

I am trying to formulate a proposal for an application that allows a user to print a batch of documents based on data stored in a SQL table. The SQL table indicates which documents are due and also contains all demographic information. This is outside of what I normally do and am trying to see if these is a platform/application that already exists to do such a task
For example
List of all documents: Document #1 - Document #10
Person 1 is due for document #: 1,5,7,8
Person 2 is due for document #: 2.6
Person 3 is due for document #: 7,8,10
etc
Ideally, what I would like is for the user to be able to push a button and get a printed stack of documents that have been customized for each user including basic demographic info like name, DOB, etc
Like i said at the top, I already have all of the needed information in a database, I am just trying to figure out the best approach to move that information onto a document
I have done some research and found some people have used mail merge in Word or using Access as a front end but I don't know if this is the best way. I've also found this document. Any advice would be greatly appreciated
If I understand your problem correctly, your problem is two-fold: Firstly, you need to find a way to generated documents based on data (mail-merge) and secondly, you might need to print them two.
For document generation you have two basic approaches: template-based and programmatically from scratch. I suppose that you will opt for a template based approach which basically means that you design (in MS Word) a template document (Word, RTF, ...) that acts as a template and contains placeholders and other tags that designate »dynamic« parts of the document. Then, at document generation time, you need a .NET library/processor that you will pass this template document and the data, where the processor will populate the template with the data and return the resulting document.
One way to achieve this functionality would be employing MS Words' native mail-merge, but you should know that this would involve using Office COM and Word Application Automation which should be avoided almost always.
Another option is to build such a system on top of Open XML SDK. This is velid option, but it will be a pretty demanding task and will most probably cost you much more than buying a commercial .NET library that does mail-merge out-of-the-box – been there, done that. But of course, the good side here is that you will be able to tailer the solution to your needs. If you go down this road I recoment that you use Content Controls for tagging documents/templates. The solution with CCs will be much easier to implement than the solution with bookmarks.
I'm not very familliar with the open source solutions and I'm not sury how many there are that can do mail-merge. One I know is FlexDoc (on CodePlex) but its problem is that uses a construct (XmlControl) for tagging that is depricated in Word 2010+.
Then there are commercial solutions. Again I don't know them in detail but I know that the majority of them are a general purpose document processing libraries. Our company has been using this document generation toolkit for some time now and I can say it covers all our »template-based document generation« needs. It doesn't require MS Word at doc generation time, and has really helpful add-in for MS word and you only need several lines of code to integrate it in your project. Templating is very powerful and you can set-up a template in a very short time. While templates are Word documents, you can generate PDF or XPS docs as well. XPS is useful because you can use .NET/WPF prining framework that works with XPS docs to print documents. This is a very high-end solution, but of course, the downside here is that it is not a free solution.

Dataset with more than three values

I am planning to create a recommender system using apache Mahout.
I searched on internet about it. and i found it uses the following format for dataset file.
userId, itemId, preference
what i want to use as a dataset have structure like this.
Id, rating, location, skills, fee
Is there any way i can do this?
Or i have to use Weka
It provides the option of creating custom dataset. but reviews suggest that it is not a good option as compared to mahout for Recommender system.
Are you planning to do collaborative filtering? Usually with CF you take in lots of user preferences about items. Then for a given user you recommend items. You don't seem to have user preferences.
In any case you will need to preprocess your data into the form required, it is all that will be used in CF anyway.
Try to understend this exemple:
https://github.com/apache/mahout/tree/master/examples/src/main/java/org/apache/mahout/cf/taste/example/bookcrossing
i hope it will help you

Web application for managing elections campaign

I’m trying to help a friend in his election campaign.
We mainly need a tool to manage a list of possible voters. We need to be able to:
1. Easily update details about the voters, and
2. Query for voters according to various parameters, and show and print the resulting lists
To enable campaigners to work from multiple workstations, we would like the system to be distributed, probably web based.
We would also like that to be in Hebrew, if possible.
Is there any existing tool that easily enables it?
If not, can you recommend on an easy way to implement such a tool?
(I have a solid programming knowledge, but not much time to devote to that)
You can achieve this easily with iFreeTools Creator. Just create the entities and attributes for Voters and add campaigners as users providing their Google email-id.
Regarding your requirements..
* This app is web-based. It runs on Google App Engine.
* The interface is English only, but data can be in unicode. Entity name and attribute names are also "data", so they can be in unicode too.
Other related features which might be useful in this context..
* You can import voter list using CSV files.
* Campaigners can search for voters near their workstation by filtering out records based on nearness to a geo-location.
// Disclosure : I wrote code for this web-app. Hope you like it. Feedback welcome.
Some possible answers might be found in the same question I asked in the web apps forum

Resources