I am looking at a CSV file hosted on an open government portal with daily updates, and would like to build a web app around it. Looking for the best approach, please advise. Current options I am looking at:
Reading CSV directly into a web app - seems to be a bad idea because it's a sizeable file over 70 Mb, likely will make it too long to load on the first user touch or even with each query
Schedule a task with something like AWS to read the file once a day and send its contents to a database such as Mongodb by either overwriting the db completely or pre-reading it and updating with newer entries, and then query this db from the web app
Am I missing any better approaches?
Related
My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?
Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter
In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.
Ive developed an app which needs to upload a small .xml file to a web server, there will be around 15 devices running this app uploading around 15 .xml files each per day. The files need to be uploaded to the same directory.
What would be the best way to achieve this? Im assuming i cant use the same login details for the server on every device, is there any hosting out there that allows multiple different logins?
Thanks.
Paul.
Take a look at Parse.com's data solution. You can set up a free account and have your devices post the data a database using their API. Pretty easy to set up with iOS and for basic services it's free.
I and my friend are working on a GWT-Google App Engine project, using Tortoise SVN and Google Code to synchronize the code.
We also synchronize the local_db.bin file in appengine-generated folder. But we cant get it work. After synchronize the db file, our local datastore is not updated as we expected.
That is a pain. Im worrying about our future, when our database get bigger and more complicated #A#.
Anyone please give me an advice. What should i do to synchronize our local datastore?
I have to suggestions:
1) Use remote api : https://developers.google.com/appengine/articles/remote_api to share a GAE hosted db locally.
2) Maybe you can use Gdrive to sync folders.
This is a really bad idea. Even if you weren't having trouble making both ends read from the same datastore file, the local datastore is in a binary format, and thus you won't both be able to work on the app at the same time, or you'll get merge conflicts you will be unable to resolve.
Instead, both for collaboration purposes and for testing and deployment, you should provide a set of test data you can easily load into the datastore. Store the test data in version control, and load it in using bulkloader or your own code.
We have an application that we're deploying on GAE. I've been tasked with coming up with options for replicating the data that we're storing the the GAE data store to a system running in Amazon's cloud.
Ideally we could do this without having to transfer the entire data store on every sync. The replication does not need to be in anything close to real time, so something like a once or twice a day sync would work just fine.
Can anyone with some experience with GAE help me out here with what the options might be? So far I've come up with:
Use the Google provided bulkloader.py to export the data to CSV and somehow transfer the CSV to Amazon and process there
Create a Java app that runs on GAE, reads the data from the data store and sends the data to another Java app running on Amazon.
Do those options work? What would be the gotchas with those? What other options are there?
You could use a logic similar to what App Engine HRD migration or backup tool are doing:
Mark modified entities with a child entity marker
Run a MapperPipeline using App Engine mapreduce library iterating on those entity using a Datastore Input Reader
In your map function fetch the parent entity and serialize it to Google Storage using a File Output Writer and remove the marker
Ping the remote host to import those entity from the Google Storage url
As an alternative to 3 and 4, you could make multiple urlfetch(POST) to send each serialized entity to the remote host directly, but it is more fragile as an single failure could compromise the integrity of your data import.
You could look at the datastore admin source code for inspiration.
my problem's a bit complicated. basically i created a client/server CMS architecture that worked very well for a while. now that there are more customers, it's getting very slow and i don't really know how to fix it.
let me explain you the current architecture:
i've developed a content management system to serve various different customers. there's a cms server where each customer has an account to manage the content of his or her website. all customers work on the same interface and store the content in the same database on the cms server. so, for every new customer, i just have to open up a new account on the cms server and they can start managing their content.
to display that content, i have to create a customized website for each customer. that website frontend can run on one of my servers or the customer can host it himself. this frontend now has to connect to the cms server to fetch the content.
on the cms server, there's a php file called "share.php". it allows you to add some parameters such as 'content_ID' to specify the content. the php file then displays that content in JSON format.
on the frontend, i use file_get_contents("{cms_server}/share.php?content_ID=34"); to retrieve the data from the cms server.
as i said, it worked very well for some time when there were few customers using that system. now however a page load lasts at least a few seconds and it's getting worse.
do i just need to increase performance on the cms server or does the concept of retrieving data with file_get_contents(); just suck big time? :D
i appreciate your recommendations of how to fix that problem.
cheers.
Probably you need to look at your database: do you need to add indexes? Are you making redundant calls? Are you making many small SELECTs which could be made into one big one? And so forth.