How to get site data

How to get site data - analytics

I have a site that features other websites, and displayed details. Now I want to get more information about the sites I feature like page views, visits, etc.
How do I do that? Is there an API for it?

First of all, information about how many visits, pageviews, etc. other websites have is generally not publicly available, because (obviously) many companies / website owners don't want to share that information and there's no general-purpose way of getting it.
That said, here's list of websites which attempt to display that kind of information:
Quantcast
Alexa
Compete.com
Google Ad Planner
I'm sure there are others, but these are the ones I'm familiar with. Some of them have APIs, but you should keep in mind that none of them provide accurate data, but only estimates, simply because exact numbers are unknown unless published by the website owner.

Have a look at Google Analytics. It can give you information about visits, pageviews, trends, used webbrowers, screen resolutions and many more!

Related

Multiple websites on an unique GA4 account

I'm working in a company which has multiple international websites and I wanted to asked you if it's more interesting to have just on Google Analytics 4 account for all the websites or should I keep an account for every website ?
We have currently 12 websites and it will grow this year so I want to setup the best things before all of this.
Thank you,
I tried to do an unique account for two websites but i can't split the metrics and the events were not working.

You may create one analytics google profile using this link https://analytics.google.com/analytics/web/provision/#/provision
BUT, I would create a separate account for every website. Otherwise it will probably create problems in the future reading data from API ( I mean in the back-end coding system), analysing the retrieved data since every time you need to filter based on the domain etc.
Plus, there are limitations for creating the customised dimensions and metrics. Of course, I suppose you are talking about separate businesses or better say, different clients.

Thank you for your answer.
I already have an account for every website and so a property = a website.
But what I wanted to know if it's better to have an unique property for all my website to have all datas in one property.
Thank you,

Is it alright to track your users actions on the site for analytics purposes?

We use a tool that tracks individual users' mouse movements and clicks on our site. Right now it only tracks anonymous visitors, but we're thinking of using it to track specific logged in users' data. We'd be using it for analytics, but we'd like to have the data in case we need to analyze how a particular person uses the site.
Are people, in general, alright with this? Does this constitute privacy infringement?

The short answer is it is your site, for the most part (for now) you can track whatever you want on it.
However, some things to consider...
a) 3rd party analytics tools have their own privacy policies and Terms of Services that may or may not allow this, so if you are using something like Google Analytics, Omniture SiteCatalyst, WebTrends, Yahoo Web Analytics, etc.. then you need to read over their Privacy Policy and Terms of Service to make sure you are allowed to track this sort of thing. Offhand I don't think any of the ones I mentioned disallow tracking mouse movements/clicks specifically (and in fact, some of them have features/plugins for it, called "clickmap" tracking, or similar), but some do have restrictions on other data you may couple with this. For example, I know Google does not allow you to associate any data with the user's IP address. You cannot send it to GA in a custom variable, nor can you store it on your own server in any way that you can associate it with data you send to GA (for example, storing the user's IP in your own database along with a unique id, and then sending the unique id to GA, where you can then lookup IP by that unique id).
b) Privacy is indeed a concern that is currently being discussed by the powers-that-be, and your ability to track certain things may indeed be limited in the future. For now, it's mostly about personally identifiable information, and it's mostly happening in Europe, and tracking mouse movement/clicks generally isn't personally identifiable, but who knows what the future may bring.
c) Make sure you understand the costs involved in tracking mouse movements/clicks. In order to track something, a request has to be made, data sent somewhere. The more granular the data, the more requests and/or data needs to be sent. Whether it is your own baked up tracking solution on your own server or a 3rd party, this will cost something one way or the other. Imagine sending a request to a server for every x,y position of the mouse as it moves...this could quickly add up, and a lot of 3rd party solutions place a limit on how many requests can be made per visit(or) or day on an account.
d) On that note, if you are using a 3rd party solution, tracking something this granular may affect tracking more important stuff. As mentioned in "c", many 3rd party solutions limit how many requests can be made per visit(or) or day on your account, etc.. and if you hit the limit, any requests after that won't be tracked. Imagine if you have tracking on a sale confirmation page, tracking details about a sale made, which is very important tracking, being tossed out because of too many requests of mouse movements on some random page...
e) On that note... consider how actionable tracking mouse movements and clicks really is to you. This is a question you have to really ask yourself whenever you want to track something: "How actionable is this?" Basically, imagine yourself having the tracking in place and looking at the data...then what? What will you do with that data? Assuming the ultimate goal is to make more money, increase conversions on your site, etc.. do you really think knowing the paths a mouse cursor took on a given webpage will help you increase sales/conversions? How will you be able to know if the mouse movements are related to content on your page, or if they were just some random jerks/movements while reading content or making room on a desk, etc..? At best, the data will be polluted...
Clicks on links or specific action buttons on a page? Sure, those are certainly worth tracking. And most 3rd party solutions automatically track a lot of that stuff, or offer custom coding solutions for manual wiring up of things. And there are plenty of reports that can be made showing activity from them.

How can I get product information intoa database without having to populate it manually?

I am looking for a method of dynamically linking product information based on the name of the product.
For example: User types in "Playstation 3", the site would then go out and grab any information it can, such as picture, retail price, etc. Ideally, it would let you choose the correct item (returns both ps3 controller and ps3 console, user can choose which). It would then use this information in a product listing.
The easiest way I can think to implement this is to use the existing API of a major retailer such as Amazon. I have a couple completely different ideas for sites, one of which would involve selling from amazon (which I would assume they would be ok with) and another which would only be data mining the information. I am concerned they would not take it very kindly if I was just stealing their images and descriptions.
Is there another way, maybe less "sneaky" way to accomplish this that wouldn't be in legally frowned upon ?

Many web-commerce companies use a data stream known as an API - EBay, Etsy, and Amazon all have API feeds for their products. If you can convince the company to allow you access to their API (usually they will give you a key/password), then you can directly access their back-end database, typically at the read-only level. Depending on the company, you can just write them directly for access.
You are correct when you say that most companies wouldn't take kindly to someone web-scraping their product directory and re-using it. That is unethical, and could lead to big trouble with larger companies with a significant legal presence.
On the other hand, there is nothing to prevent you from cobbling together several API feeds into a Mash-Up - try Yahoo Pipes! to learn the basics of API/Mash-Up integration:
Yahoo Pipes:
http://pipes.yahoo.com/pipes/
Here is the link to Amazon's Product Advertising API program:
https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
Good luck, and happy development!

Many online retailers provide a product feed - either well-publicized (William M-B has listed some examples), or sorta-kinda hidden, for the purposes of affiliate marketing. They usually have terms of use around those product feeds, describing in detail what you're allowed to do with them, and exactly how many of your limbs are at risk if you don't play by their rules.
However, the mechanism you're describing sounds remarkably similar to a search engine; there's a well-established precedent for search engines indexing sites, and using their content to reason about the underlying site. Get a lawyer to validate this, but there's a good chance that your intended purpose falls under "fair use".

I'm representative of http://aerse.com.
We are building service, that do the following:
search product by name. For example: galaxy s3, galaxy s 3 or galaxy sIII
return technical specifications (CPU, RAM etc) and product images (thumbnails and high-res images)
provide API http://aerse.com/p
deal with legal issues, provide licenses & etc.

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.
To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.
The options I have tried are:
Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.
I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.

Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.

Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.

This might be what you're looking for. They even offer a complete download!
https://openlibrary.org/data

As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.
Now knowing the "right" term to search for I discovered WorldCat.org.
Maybe this whole MARC thing gives you a new kind of an idea :)

Best practices on what data to collect in an in-app web analytics

In our SaaSy webapp we need to collect Google Analytics-like data (like, what pages were visited, how many 404s where there, etc.). I wonder if there are any best practices on what pieces of information should be collected (like, IP, User Agent, etc.) and how should these logs be stored. Requirements on what statistics we're going to display are not yet fixed, but I want to have a starting point.

Tracking for the sake of tracking is pointless. The point of tracking activity on your site is to answer specific business questions, such as how many people are buying your product, or how far are they getting in your sale funnel or other events like signing up for a newsletter, etc...What you should be doing is asking the people who make business decisions what it is they need/want to know, and go from there.
Having said that, most ad-hoc reports can be generated with basics like the URL and timestamp. Ability to parse specific variables from the URL and categorize them and their values is handy for campaign tracking. Tracking IP addresses are good for debugging and finding out what country/region/market the user is coming from. Referring URL is good for tracking where the user came from on the internet (another site, paid vs. organic search, a campaign, etc...).
And then throw a couple of variables into the mix. Allow for the ability to populate variables with arbitrary information (like product IDs, etc...) that can be sent to you and stored, so you can see things like how many times a product was viewed or purchased, how much it cost, etc...
But anyways, to answer your question, ultimately "best practice" is first sitting down with the guys in suits and ask what they want/need to know and work with them to find out if what they want to know is just silly or if it's actually actionable (for example, knowing things like number of pageviews is okay but how actionable is it really? What's MORE actionable is knowing how many of xyz is being sold, or where on your site people are abandoning you, so you can streamline your site, maybe decide your product or offer sucks and needs to be revisited, etc...).
I have to ask though...is there a particular reason you wish to create your own tracking tool as opposed to using or investing in one of the many tools already out there? There is Google Analytics (GA), Yahoo Web Analytics (YWA), Omniture SiteCatalyst, Webtrends to name a few. Some are free, some cost money, but it is an investment that yields real returns if used properly.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight