I want to do a Social Network Analysis using Hadoop in Telecom industry. I'm looking for a dataset to work... There anyone knows a good dataset to analyze some relationships between users?
Many thanks!
Here are some links for social network analytics dataset.
Provided by Arizona State University
UCINET IV
Provide by Stanford University
Related
I am a Student Of Data Science. I am currently conducting research on traffic patterns in Ahmedabad, Gujarat, India and I am in need of traffic data for the city.
I would greatly appreciate it if you could provide me with any traffic data that is available for Ahmedabad. This can include information on traffic flow, accidents, delays, and any other relevant data.
For Example Consider This dataset :-https://www.kaggle.com/datasets/fedesoriano/traffic-prediction-dataset
P.S If Not Ahemdabad Any Indian City Will Work.
I'll be short to save your time :)
I'm new at StackOverflow and also new with IBM Watson.
We are building an EMR (electronic medical records) system and would be glad to enhance it with Watson cognitive capabilities for healthcare.
Where do I start from?
Is here anyone who has ever used cognitive approach for assisted medical decision making? Can anyone give me an orientation?
I thought to start with Q&A for doctors but Q&A has been depreciated by IBM. Predictive analytics would also be exciting for physicians, however, what is the starting point?
Thank you beforehand!
I think you refer to a deprecated Bluemix API for health. One thing you can do is use Retrieve and Rank API on a trusted set of documents.
yes I have used the following for health with IBM Watson
Reading chest X-rays - https://www.ibm.com/watson/developercloud/doc/visual-recognition/
Reading EKG's - https://www.ibm.com/watson/developercloud/doc/visual-recognition/
Patient diagnosis for chest pain - https://www.ibm.com/watson/developercloud/dialog.html
Physical exam - we started to use retrieve and rank for machine learning of a patient's physical exam over the years. - https://www.ibm.com/watson/developercloud/retrieve-rank.html
Speech to text (patient telling watson where it hurts) - https://www.ibm.com/watson/developercloud/speech-to-text.html
As you can see there are many different watson api's .
I've been trying to use the IBM Watson Document Conversion service with the demo PDF, but it's not converting the document into little bits. All it's doing, is creating 1 answer unit, that's really long:
"text": "Watson is an artificially intelligent computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first CEO and industrialist Thomas J. Watson.[3][4] The computer system was specifically developed to answer questions on the quiz show Jeopardy![5] In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings.[3][6] Watson received the first place prize of $1 million.[7] Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage[8] including the full text of Wikipedia,[9] but was not connected to the Internet during the game.[10][11] For each clue, Watson's three most probable responses were displayed on the television screen. Watson consistently outperformed its human opponents on the game's signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words. In February 2013, IBM announced that Watson software system's first commercial application would be for utilization management decisions in lung cancer treatment at Memorial Sloan- Kettering Cancer Center in conjunction with health insurance company WellPoint.[12] IBM Watson's former business chief Manoj Saxena says that 90% of nurses in the field who use Watson now follow its guidance.[13]"
Thanks in advance!
Unfortunately, that demo PDF is not the best document to use: Currently, Answer Units are split based on heading tags (h1 - h6), and that PDF doesn't contain any headers. =(
If you set the conversion_target to NORMALIZED_HTML, you'll be able to see the converted PDF before it is split up into Answer Units. It will contain paragraphs but no headings.
In the future, we expect to also allow splitting Answer Units by paragraph, but that hasn't been released yet.
UPDATE:
We updated the PDF on the demo site with one that's a much better example.
I hope anyone can help me out in this topic, even if it's not a specific programming question.
I'm writing a bachelor thesis, where I compare MySQL to MongoDB and I want to write something about Youtube, as the platform has to handle many requests with heavy dataload.
The only good resource which I found was this video: Seattle Conference on Scalability: YouTube Scalability
As the conference was in 2007, I can imagine there were some updates regarding to the database.
The last information that I have from this talk is that the thumbnails are stored in a BigTable database and the metadata in MySQL. Are there any changes since then?
Where are the videos stored? Is there an entry in the MySQL table, which refers to the stored video?
Thanks in advance for the answer!
According to this, youtube still uses mysql: http://code.google.com/p/vitess/wiki/ProjectGoals
I am not sure of how things are at youtube but I am in process of developing a similar application for our client. So what we are doing is we are making the use of best of both worlds i.e SQL and NoSQL..
We store the videos on disk and store the path to these videos in MySQL db table. Then we have a separate table which holds the genre and video mapping i.e which video belongs to which particular genre.
Today with vast of pool of user data we are in position to leverage upon these data like we had never been before, so you see things are now way different then 2007 and with the popularity and dependency of people on internet when it comes to sites like you tube we have vast set of unstructured data which if used properly can give you great results. So in our project we store the site admin and reporting stuff like user db, video locations and genre mapping etc in MySQL and store the unstructured data about user interaction in NoSQL database. We then use the NoSQL data to do all the analytics and give appropriate results to the user.
They are using mysql with Bigdata.
The user information such has who uploaded the file,file information all will be stored in mysql and data will be stored in Bigdata.
I think they are using database that can use FileTable
We are looking at acquiring Data Mining software to primarily run predictive analysis processes.
How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM?
Since SQL Server DM is included in SQL Server Enterprise license - what would be the justification to spend extra couple 100K to buy separate software just to do DM?
I would look into open source options as well, including R, RapidMiner, Weka
I would recommend checking out the Rexer survey, as it shows popularity and satisfaction measures for a variety of data mining products:
http://www.kdnuggets.com/2010/03/f-annual-rexer-analytics-data-miner-survey-results.html
Depending on what you are looking to accomplish, and obviously your budget, there are certainly some great things being done in R. Check out Rattle for R and Revolution Computing.
I am a big fan of SPSS, and unfortunately have not used their Modeler package, but it seems like it may be worth considering. I have used SAS Enterprise Miner, and while it is powerful, I am not a big fan.
I haven't dabbled with Weka that much, but I found RapidMiner to have a steep learning curve, but does have alot of capability.
If you want to keep everything in the Microsoft stack check out www.predixionsoftware.com which is planning the release of a disruptive Excel add-in as an update to the current MS DM add-ins.
You might want want to give KNIME a try before paying for something else. Works well with databases and is excellent for exploratory analysis.
I would suggest to check open-source data mining software. There are some very good open-source software that are free.
I Would start by building some data mining models in SSAS using both Multidimensional and Tabular, and then get an account for Google Analytics. I built a social networking website that was set up where members had to join and used Google Analytics to start building reporting dashboards and have probably built near a thousand. Good starting point, R is good, Omni used to be the top dawg but Adobe bought them, clicktracks, quilk view, Sisense, Tableau, Actuate, however I would wait and see how the product Microsoft releases is. Chances are it will set itself apart like they have in the BI market and shot up to 2nd in market share and 1st in growth in the database market.