This is the scenario i have:
im developing a web app that will list down all the details of a car that the user picks from a list. I have a database of all car models, makes, sizes, prices etc. Besides i also have the price trend for the past 5 yrs. You may assume that i have a few of such tables and the data volume is about 10s of thousands of records.
My online application should be able to let the user pick his choice of one car model and optionally provide his address. With just this user input, i want to be able to generate a pdf report with the following information:
Comparison of selected car model with other cars manufactured in the same country. (e.g, if user selected, honda, i want to compare it with toyota, which comes from the same country)
Comparion of selected car with other car of similar type (eg. sedan vs sedan)
Price trend of the car for the last 5 yrs - Nearest car workshops in user's neighbourhood within a radius of 10km (if user has given me his address)
i will be drawing out several other data from my database.
I would like to present this report instantly, say within 3 minutes to the user. So now the question, is, what software/tools/program/database etc should i be using, taking in consideration the huge amount of data and the need to present this in the fastest possible time as a pdf report?
There are whole lot of possibilities. You can use PHP (or) Java (or) .Net (or) so on...for web application, MySQL, SQL Server, Oracle etc., as database (If data is really big and grows like anything daily you may consider Hbase also) It dependency on how soon you want your product out in the market and how much scalable it should be and how much comfortable you are with any of those technologies.
Some technologies support nice user interface, some may not but strong in other area of web application.
How much money/time you have for development, licensing also plays role in deciding answer for this question.
Related
Assuming a e commerce web app has a high amount of requests, how do I prevent two users from choosing the only product left? Should I check the quantity when adding to shopping list or payment? Is it using a field to record quantity of selected product in DB is bad way? How does the large e commerce web app like amazon deal with conflict problem?
Several options that I know :
For the RDBMS that support ACID , you can use optimistic locking technique on the product table. Unless it is very often that many users hit the buying button on the same product at the nearly same times ,it should work pretty well.(For how many users does the 'many' means, you have to measure it. I think 1k should be no problem. Just my guess , don't take it for granted)
Do not check it and let users to buy it. Adjust the business flow to handle it. For example, when an user hits the buying button ,tell him his order is just accepted and will be processed but not guarantee he must able to buy it. Then in the later stage when you find that there is not enough inventory to ship the product to him , send an email to apologise and refund to him.
Also in the real business , it is common that the product inventory can go to negative and still accepting orders but tell the user he will get the product at XXX days later. The business can then produce or order more product from the supplier after receiving the money.
If you are buying iPhone on the Apple web site , it also works like this.
It really depends upon the number of concurrent users here. In the case of millions, the NoSQL approach is prefered to manage the basket with eventual consistency then the buying process would go with ACID to ensure the product can be sold.
For less users, you can rely on an ACID database.
If you are not sure, you may go with a database that has ACID capabilities but can as well allow you to work in an eventual consistency way or that can implement the concept of sharding for scalability purpose. To my knowledge Oracle can do these 3 things: COMMIT NO WAIT, COMMIT and Sharding deployment.
HTH
We're building an app that will have a number of games. Kids will learn Math as they play these games. All the user profile data, game data and lessons/ questions data are all being stored in the app and will sync to a MySQL database on the server side.
There also tons of events data that we would like to capture, analyze and improve our game. These events could be the start of a lesson, touching a game object, choosing the correct game object but targeting it wrongly, answering correctly but got timed out and so on. We expect this to be 100s of rows for each game that the kids plays. Also the data stored will be dependent on the type of event.
The database should allow us to analyze the data and answer questions like which games are tough on kids, which lessons are too easy for kids, are kids from some countries finding some of the lessons to be tough, how long are each of these games able to hold the attention of the kid and so on.
Which database would allow us to store so many different types of events, scale to millions of rows a day and allow for all these kinds of analysis? Given the changing nature of the data model, NoSQL seems to be an obvious choice. But which one would allow us to do all these analysis. Or should we go with Hadoop / Hive?
Thanks in advance.
Although you can do this using Hadoop/Hive, but you won't get real time performance as Hive is best suited for batch processing kinda stuff. Hbase would be a better choice in such a scenario. You could create OLAP datacube kinda thing whose dimensions could be the info specified by you, like session info, info about each kid etc etc. Or you could serialize all of this information as JSON objects and then store them in Hbase cells. You could also store each of these events in individual cells, but that would consume unnecessary space and won't be that efficient while fetching the data back.
HTH
Let's say I am developing meta-search app for hotel bookings using API of Expedia, Booking etc on Ruby on Rails.
What is the best way to consume API if I want to render hotels list in accordance with "arrival date", "departure date" and "longitude" "latitude" in radius 20 miles.
I have every coordinate needed for such search (around 10 thousand coords).
So if I make such search it returns more than 100 hotels within 20 miles for each coordinate, so there are tens of thousands unique hotels.
Should I pre-seed my database with all hotels for each coordinate and consume only number of rooms available and other dynamic variables, or it will be better to render list dynamically through Javascript?
Also that is the best database for this task? i am interested in Expedia affiliate program in particular
If you are interested, I found the answer to my question.
Take a look at the Expedia API documentation closer http://developer.ean.com/docs/read/hotels, especially the databases section (need to sign in). I found that there may be several approaches to this problem depending on the application. I've stopped on the option of importing the whole property database, which is fully accessible on the Expedia site and consume only dynamic parameters like Room Availability, prices, etc. However this will require frequent updates to the database.
I am looking for a method of dynamically linking product information based on the name of the product.
For example: User types in "Playstation 3", the site would then go out and grab any information it can, such as picture, retail price, etc. Ideally, it would let you choose the correct item (returns both ps3 controller and ps3 console, user can choose which). It would then use this information in a product listing.
The easiest way I can think to implement this is to use the existing API of a major retailer such as Amazon. I have a couple completely different ideas for sites, one of which would involve selling from amazon (which I would assume they would be ok with) and another which would only be data mining the information. I am concerned they would not take it very kindly if I was just stealing their images and descriptions.
Is there another way, maybe less "sneaky" way to accomplish this that wouldn't be in legally frowned upon ?
Many web-commerce companies use a data stream known as an API - EBay, Etsy, and Amazon all have API feeds for their products. If you can convince the company to allow you access to their API (usually they will give you a key/password), then you can directly access their back-end database, typically at the read-only level. Depending on the company, you can just write them directly for access.
You are correct when you say that most companies wouldn't take kindly to someone web-scraping their product directory and re-using it. That is unethical, and could lead to big trouble with larger companies with a significant legal presence.
On the other hand, there is nothing to prevent you from cobbling together several API feeds into a Mash-Up - try Yahoo Pipes! to learn the basics of API/Mash-Up integration:
Yahoo Pipes:
http://pipes.yahoo.com/pipes/
Here is the link to Amazon's Product Advertising API program:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Good luck, and happy development!
Many online retailers provide a product feed - either well-publicized (William M-B has listed some examples), or sorta-kinda hidden, for the purposes of affiliate marketing. They usually have terms of use around those product feeds, describing in detail what you're allowed to do with them, and exactly how many of your limbs are at risk if you don't play by their rules.
However, the mechanism you're describing sounds remarkably similar to a search engine; there's a well-established precedent for search engines indexing sites, and using their content to reason about the underlying site. Get a lawyer to validate this, but there's a good chance that your intended purpose falls under "fair use".
I'm representative of http://aerse.com.
We are building service, that do the following:
search product by name. For example: galaxy s3, galaxy s 3 or galaxy sIII
return technical specifications (CPU, RAM etc) and product images (thumbnails and high-res images)
provide API http://aerse.com/p
deal with legal issues, provide licenses & etc.
So, I've just decided to build my own fantasy sports web site.
You know the type of site where you can pick players from your favourite league and depending on how they do you get a certain amount of points in your team. There are fantasy teams for all types of leagues and sports, I'm sure you know what I'm talking about.
I haven't settled for a specific sport or league just yet because I want the basics to fit to different types of team-based sports.
I have a few expectations on it myself. If you can come up with any other I'll be glad to hear them.
I expect the site to be dynamic and have many visits during a game, but almost only static content otherwise.
Player points should be updated in real-time during a game.
I would need a list that shows each game being played and the points of every player in that game. It should also show minutes played, goals, assists etc.
Each registered user would be able to see the points and players of his/hers team updated in real time.
I need the site to scale so that if I start with 1000 teams I could end up with 5 million.
I probably won't be needing language support right now, but who knows in the future.
Based on these prerequisites what would be best to use in terms of language (php, .NET, drupal or other cms's), database (mysql, sqlserver, xml) and other techniques?
Maybe it doesn't really matter what I use?
I guess the dynamic and real time update of each player's points is where I need help the most.
Thanks in advance!
/Niklas
EDITED
I could use an array with the following data for a specific game week:
Player ID
Minutes played
Sport specific points(goals, assists, penalties, yellow cards, man of the match bonus) etc.
Total points in current game week
When the game is over I'd add these to a DB and sum this data with any previous game weeks. Plus player value, number of teams that has selected this player, etc.
You are probably going to have to go down the custom route for your "Game" code - rather than using a CMS, although depending on your experience, you may be able to leverage a framework (e.g CodeIgniter) to speed up some of your DEV time.
This type of site would be pretty language agnostic, however it would depend on the actual numbers of users you are looking at as to the most scalable solution / set of techniques to deploy.
One of the biggest considerations you are going to have to look at would be the design of the data model, and the platform that this sits on.
If you want to be processing near to realtime updates, you are going to want to focus your efforts on making the DB queries / processing the most efficient possible.
One big consideration that you have not discussed here is caching. There is some data on your site that I am sure will be static for long periods of time (such as weekly totals etc), and there is data that will be very much real time (but only during match days).
However, during match days you will have a lot more traffic than non match days, and you will therefore have a lot of requests for the same data in a short period of time. Therefore, employing a good caching strategy will save you masses of CPU power. What I am thinking of, is to calculate a player's score and then cache for 1 minute at a time, therefore each time that specific player is requested, you are retrieving from a cache, rather than recalculating each time.