I am a Student Of Data Science. I am currently conducting research on traffic patterns in Ahmedabad, Gujarat, India and I am in need of traffic data for the city.
I would greatly appreciate it if you could provide me with any traffic data that is available for Ahmedabad. This can include information on traffic flow, accidents, delays, and any other relevant data.
For Example Consider This dataset :-https://www.kaggle.com/datasets/fedesoriano/traffic-prediction-dataset
P.S If Not Ahemdabad Any Indian City Will Work.
Related
I would like to build a real scenario using real road network and real traffic data. This is for testing purposes of some research work.
I know how to extract a map form OSM according to the tutorial on SUMO website. I have 2 questions:
1- from where can I extract real traffic data that corresponds to extracted map? data like avg. speed of roads? number of vehicles on each road, vehicle density, etc.
2- how can I use that data to generate routes to be used in the scenario?
Thanks
There is no general open data source for traffic demand data. It is possible to generate something from population statistics using SUMO's activitygen or from traffic counts using the dfrouter or the flowrouter. There are also tools available for importing origin-destination-matrices but most of these sources you have to get from the city or region in question.
I am working on a data warehouse project for my final year degree. My intention is to create a data warehouse within the Business Intelligence spectrum, that incorporates customers' information from a company's point-of-sales data for a given year or location for example. However, my problem is finding the right raw data for this purpose as I had been in contact with tens of companies who refused to share their own data!
Is there any source for a large free data set I can use to fit the above scenario i.e. Business Intelligence purpose?
Any help would be greatly appreciated!
I want to do a Social Network Analysis using Hadoop in Telecom industry. I'm looking for a dataset to work... There anyone knows a good dataset to analyze some relationships between users?
Many thanks!
Here are some links for social network analytics dataset.
Provided by Arizona State University
UCINET IV
Provide by Stanford University
I have a high traffic web site.
I want to create software which analyses client requests on-the-fly and decide if they come from a real user or a botnet bot. For training the neural network to identify legitimate ("good") users I can use logs when there are no DDoS activity. Once trained, the network would distinguish real users from bots.
What I have:
request URI (and order)
cookie
user agent
request frequency.
Any ideas on how to best design ANN for this task and how to tune it?
Edit: [in response to comments about the overly broad scope of this question]
I currently have a working C# program which blocks clients on the basis the frequency of identical requests. Now I'd like to improve its "intelligence" with a classifier based on neural network.
I don't know how to normalize these inputs for ANN and I need suggestions in this specific area.
This isn't really suited to neural networks. Neural networks are great provided (as a rough guide):
You can spare the processing power,
The data is not temporal,
The input data is finite,
I don't think that you really pass any of these.
Re: normalizing the inputs: you map your input data to a set of symbols (which are then turned into numbers) or you map the inputs to a floating point number where the number represents some degree of intensity. You can map any kind of data to any kind of scheme but you would really only want to use ANN's when the problem solution is nonlinear (all the data for one classification of another classification CAN'T be clustered on one side of a line with all the data for the other classification on the other side of the line). In both cases you end up with a vector of inputs associated with an output ([BOT, HUMAN], or [BOT, HUMAN, UNKNOWN] or [BOT, PROBABLY-BOT, PROBABLY-HUMAN, HUMAN], etc).
How do you distinguish between two users coincidentally submitting the exact same book request equentially in time (let's assume you are selling books)?
I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.
Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.
Looking for ideas/suggestions on approaches I may not have yet thought of.
Thanks.
The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.
Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.
Found more information here:
http://en.wikipedia.org/wiki/Extract,_transform,_load#Dealing_with_keys
Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.
To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"
You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)