Get information from various sources - database

I'm developing an app that has to get some information from various sources (APIs and RSS) and display it to the user in near real-time.
What's the best way to get it:
1.Have a cron job to update them all accounts every 12h, and when a user is requesting one, update that account, save it to the DB and show it to the user?
2.Have a cron job to update them all accounts every 6h, and when a user is requesting one, update the account and showing it to the user without saving it to the DB?
What's the best way to get it? What's faster? And what's the most scallable?

12h or 6h, you have to do the math your self, you are the only one to know how many sources, how is your app hosted, what bandwidth you have....
Have a look at http://developmentseed.org/portfolio/managing-news it is drupal based and does what you need (and much more). You can either use it or diving in the code and see how it is done.

Related

disabling kentico web analytics can increase performance?

Since we use other tools to collect analytics data I think disabling the web analytics good idea to gain increased webpage load performance such as fast load or reducing requests etc. Would that make a difference?
Thanks in advance.
Block kentico in your devtools, using the network request blocker. Reload the page, measure it's speed with your local lighthouse or performance profiler.
Unblock kentico, and repeat the procedure.
Compare the results.
Repeat until satisfied.
By all means, if you're not using the functionality, I'd recommend disabling it. However, if you want to clean up that data, you need to make sure that happens first before disabling otherwise you won't be able to clean up that analytics data.
There is a scheduled task called "Remove analytics data". You'll want to edit that task and change the "Task data" value to 540 days, then manually run it. Then go back in, edit that task, change the value to 360, then manually run it. Then go back in, edit the task, change the value to 180 days and manually run it. Finally, go back in, change the value to 0 and manually run the task.
After you've run the task with 0 days, there should be no analytics data stored. You are then safe to disable analytics.
Now if you find you really need that data then maybe you want to take a backup of the database OR just leave it in your database, it's up to you.
Lastly, no need to cross post on SO and DevNet as DevNet picks up SO posts tagged with "kentico".
Adding accepted answer from DevNet.

Does QuickBooks have any kind of audit log?

QuickBooks allows users to change posted periods. How can I tell if a user does this?
I actually don't need an audit log, but just the ability to see recently added/edited data that has a transaction date that's over a month in the past.
In a meeting today it was suggested that we may need to refresh data for all our users going back as far as a year on a regular basis. This would be pretty time consuming, and I think unnecessary when the majority of the data isn't changing. But I need to find out how can I see if data (such as an expense) has been added to a prior period so I know when to pull it again.
Is there a way to query for data (in any object or report) based not on the date of the transaction, but based on the date it was entered/edited?
I'm asking this in regard to using the QBO api, however if you know how to find this information from the web portal that may also be helpful.
QuickBooks has a ChangeDataCapture endpoint which is specifically for exactly the purpose you are describing. It's documented here:
https://developer.intuit.com/app/developer/qbo/docs/api/accounting/all-entities/changedatacapture
The TLDR summary is this:
The change data capture (cdc) operation returns a list of objects that have changed since a specified time.
e.g. You can continually query this endpoint, and you'll only get back the data that has actually changed since the last time you hit the endpoint.

Automatically push engine datastore data to bigquery tables

To move data from datastore to bigquery tables I currently follow a manual and time consuming process, that is, backing up to google cloud storage and restoring to bigquery. There is scant documentation on the restoring part so this post is handy http://sookocheff.com/posts/2014-08-04-restoring-an-app-engine-backup/
Now, there is a seemingly outdated article (with code) to do it https://cloud.google.com/bigquery/articles/datastoretobigquery
I've been, however, waiting for access to this experimental tester program that seems to automate the process, but gotten no access for months https://docs.google.com/forms/d/1HpC2B1HmtYv_PuHPsUGz_Odq0Nb43_6ySfaVJufEJTc/viewform?formkey=dHdpeXlmRlZCNWlYSE9BcE5jc2NYOUE6MQ
For some entities, I'd like to push the data to big query as it comes (inserts and possibly updates). For more like biz intelligence type of analysis, a daily push is fine.
So, what's the best way to do it?
There are three ways of entering data into bigquery:
through the UI
through the command line
via API
If you choose API, then you can have two different ways: "batch" mode or streaming API.
If you want to send data "as it comes" then you need to use the streaming API. Every time you detect a change on your datastore (or maybe once every few minutes, depending on your needs), you have to call the insertAll method of the API. Please notice you need to have a table created beforehand with the structure of your datastore. (This can be done via API if needed too).
For your second requirement, ingesting data once a day, you have the full code in the link you provided. All you need to do is adjust the JSON schema to those of your data store and you should be good to do.

Database design, real time or batch processing

I am facing two options of how to update the database, and do not know which one is better for my situation. There are three tables in the database, which are used to read/store some user's information, such the url history or some inputs.
In real time, the database is accessible by users all the time, so the changes made to the database can be seen immediately by that user.
The batch processing hides the "update" from user, database is updated by parsing the log files, and such a process runs every X hours. So user can only see their changes after X hours.
Apart from the advantage/disadvantage of synchronized/asynchronized updates that user can see. What are the other benefits of choosing real-time or batch processing updating methods for database updating?
Thanks
It all depends on the amount of traffic you expect. If you want to scale your application, asynchronous processing is always recommended. But that does not mean that your users have to wait for X hours. You can have the process run every 5 minutes or even every minute.
This way you will reduce concurrency issues and at the same time users will be able to see their updated history with a little bit of delay.
See best practices for scalability in the book Scalability Rules
I would suggest you use EDA (Event Driven Architecture) which uses a middleware to
"glue" all of this.
http://searchsoa.techtarget.com/definition/event-driven-architecture
One advice : Keep away from batch processes.
Today, everything tends to be more and more real-time. Imagine if you would receive my answer in X hours... would you be satisfied? :)
If you give us more Info, we could also help you more.
I see that your input comes from a log file? Can this be changed?
You could also implement the observer pattern.

Google App Engine - How to implement the activity stream in a social network

I want some ideas on the best practice to implement an activity stream for a social network im building in app engine (PYTHON)
I first want to keep a log for all activities of each user - so that we have a history. i.e. someone became a friend, added a picture, changed their address etc. This way we have a users history available should we need it. Also mean we can remove friendship joins, change user data but have a historical log.
I also want to stream a users activity to their friends. for this only the last X activities need to be kept - that is in the scenario that messages are sent to friends when an activity occurs.
Its pretty straight forward designing a history log - ie: when, what, where. The complication comes as to how we notify friends of a user as to their activity.
In our app friendships are not mutual - ie they are based on the twitter following model. Some accounts could have thousands of followers.
What is the best approach to model this.
using a many to many join table and doing a costly query -
using a feed class that fired a copy of the activity to all the subscribers - maybe into mcache? As their maybe a need to fire thousands of messages i would imagine a cron job would need to be used.
Any help ideas thoughts on this
Thx
There's a great talk by Brett Slatkin called Building Scalable, Complex Apps on App Engine from last year's Google I/O, in which the example is a Twitter-like application, where users' updates are pushed to their followers. Basically exactly what you're trying to do.
I highly recommend the video for anyone writing an App Engine app, it's really helpful.
Don't do joins. They're too expensive, you'll burn through your quota in no time.
You can use a task queue, it's a bit like a cron job (i.e. stuff happens outside of the original request) but you can start them at will. memcache would be good if you're ok with loosing some activity at times the cache is flushed...

Resources