analytics service which tells what users do inside app - mobile

I understand that Flurry gives analytics including: user retention, daily users, average active session time etc. But can I use it or some other service to tell me how many users have clicked a certain button, how much time users spend on a specific view etc?

Yes, you can. Localytics (I work there), Flurry and other app analytics services support the recording of "events" that do exactly what you described. The events also have "attributes" that you can use to track additional details, like time (which we generally recommend bucketing -- ie, 0-5 seconds, 6-10 sections, etc.). Here's a link to our integration docs: http://www.localytics.com/docs/iphone-integration/
You might also want to track screens, and then combine events and screens into into conversion funnels.

Related

How can I see the daily instace usage on App Engine?

The app engine dashboard has an Instances view, but it shows the instances in hourly interval. Is there a way to set this interval to day, so I can see the daily sum of instance usage for the last week, for example?
I tried clicking the gray pills, but nothing happens. The arrow on the right reveals the 3 metrics shown, but they cannot be clicked either.
To monitor the instance count of my app engine services up to 30 days, I go to [1].
There you can filter by service name and time.
[1] https://console.cloud.google.com/appengine/instances
Here you may find an overview of all the Monitoring tools GCP offers for your Application Performance Management.
Additionally, you can use the Metrics Explorer to display App Engine Metrics, while all versions of your GAE will be concurrently displayed in one chart.
There you may select the metric you would like to display, as for example: "Memory Usage". Furthermore, you can reduce the amount of data returned for a metric by combining data from similar time series, using the "Group By" and "Aggregator" filtering methods.
I hope this information helps.

10,000 HTTP requests per minute performance

I'm fairly experienced with web crawlers, however, this question is in regards to performance and scale. I'm needing to request and crawl 150,000 urls over an interval(most urls are every 15 minutes which makes it about 10,000 requests per minute). These pages have a decent amount of data(around 200kb per page). Each of the 150,000 urls exist in our database(MSSQL) with a timestamp of the last crawl date, and an interval for so we know when to crawl again.
This is where we get an extra layer of complexity. They do have an API which allows for up to 10 items per call. The information we need exists partially only in the API, and partially only on the web page. The owner is allowing us to make web calls and their servers can handle it, however, they can not update their API or provide direct data access.
So the flow should be something like: Get 10 records from the database that intervals have passed and need to be crawled, then hit the API. Then each item in the batch of 10 needs their own separate web-requests. Once the request returns the HTML we parse it and update records in our database.
I am interested in getting some advice on the correct way to handle the infrastructure. Assuming a multi-server environment some business requirements:
Once a URL record is ready to be crawled, we want to ensure it is only grabbed and ran by a single server. If two servers check it out simultaneously and run, it can corrupt our data.
The workload can vary, currently, it is 150,000 url records, but that can go much lower or much higher. While I don't expect more than a 10% change per day, having some sort of auto-scale would be nice.
After each request returns the HTML we need to parse it and update records in our database with the individual data pieces. Some host providers allow free incoming data but charge for outgoing. So ideally the code base that requests the webpage and then parses the data also has direct SQL access. (As opposed to a micro-service approach)
Something like a multi-server blocking collection(Azure queue?), autoscaling VMs that poll the queue, single database host server which is also queried by MVC app that displays data to users.
Any advice or critique is greatly appreciated.
Messaging
I echo Evandro's comment and would explore Service Bus Message Queues of Event Hubs for loading a queue to be processed by your compute nodes. Message Queues support record locking which based on your write up might be attractive.
Compute Options
I also agree that Azure Functions would provide a good platform for scaling your compute/processing operations (calling the API & scraping HTML). In addition Azure Functions can be triggered by Message Queues, Event Hubs OR Event Grid. [Note: Event Grid allows you to connect various Azure services (pub/sub) with durable messaging. So it might play a helpful middle-man role in your scenario.]
Another option for compute could be Azure Container Instances (ACI) as you could spin up containers on demand to process your records. This does not have the same auto-scaling capability that Functions does though and also does not support the direct binding operations.
Data Processing Concern (Ingress/Egress)
Indeed Azure does not charge for data ingress but any data leaving Azure will have an egress charge after the initial 5 GB each month. [https://azure.microsoft.com/en-us/pricing/details/bandwidth/]
You should be able to have the Azure Functions handle calling the API, scraping the HTML and writing to the database. You might have to break those up into separated Functions but you can chain Functions together easily either directly or with LogicApps.

DynamoDB - Do I need lots of read capacities to handle multiple getItem-calls per page?

I'm using DynamoDB to store items that are necessary to deliver a specific webpage. However, for one page load, the web server may easily need hundreds of items from about 2-5 different tables. If I have only one read capacity I can only make 2 eventually consistent DB calls per second. Of course if I need to get these items to deliver a webpage, I cannot wait one second for every DB call.
I already use batchGetItems to reduce the workload. Do I now need just lots of more read capacities or am I getting something wrong?
You should be thinking caching, not fetching.
Either AWS ElasticSearch (memcached) or Varnish-like caching.
You can also implement an in-process caching using Google Guava
It's possible to tune your read capacity based on usage and that's one of the advantages of using a hosted solution like DynamoDB. You can setup CloudWatch alarms, receive notifications through a SNS topic and create a simple app to increase/decrease your capacity. There is a nice post about it at: http://engineeringblog.txtweb.com/2013/09/txtweb-scaling-with-dynamodb/

Get information from various sources

I'm developing an app that has to get some information from various sources (APIs and RSS) and display it to the user in near real-time.
What's the best way to get it:
1.Have a cron job to update them all accounts every 12h, and when a user is requesting one, update that account, save it to the DB and show it to the user?
2.Have a cron job to update them all accounts every 6h, and when a user is requesting one, update the account and showing it to the user without saving it to the DB?
What's the best way to get it? What's faster? And what's the most scallable?
12h or 6h, you have to do the math your self, you are the only one to know how many sources, how is your app hosted, what bandwidth you have....
Have a look at http://developmentseed.org/portfolio/managing-news it is drupal based and does what you need (and much more). You can either use it or diving in the code and see how it is done.

Google App Engine - How to implement the activity stream in a social network

I want some ideas on the best practice to implement an activity stream for a social network im building in app engine (PYTHON)
I first want to keep a log for all activities of each user - so that we have a history. i.e. someone became a friend, added a picture, changed their address etc. This way we have a users history available should we need it. Also mean we can remove friendship joins, change user data but have a historical log.
I also want to stream a users activity to their friends. for this only the last X activities need to be kept - that is in the scenario that messages are sent to friends when an activity occurs.
Its pretty straight forward designing a history log - ie: when, what, where. The complication comes as to how we notify friends of a user as to their activity.
In our app friendships are not mutual - ie they are based on the twitter following model. Some accounts could have thousands of followers.
What is the best approach to model this.
using a many to many join table and doing a costly query -
using a feed class that fired a copy of the activity to all the subscribers - maybe into mcache? As their maybe a need to fire thousands of messages i would imagine a cron job would need to be used.
Any help ideas thoughts on this
Thx
There's a great talk by Brett Slatkin called Building Scalable, Complex Apps on App Engine from last year's Google I/O, in which the example is a Twitter-like application, where users' updates are pushed to their followers. Basically exactly what you're trying to do.
I highly recommend the video for anyone writing an App Engine app, it's really helpful.
Don't do joins. They're too expensive, you'll burn through your quota in no time.
You can use a task queue, it's a bit like a cron job (i.e. stuff happens outside of the original request) but you can start them at will. memcache would be good if you're ok with loosing some activity at times the cache is flushed...

Resources