I am attempting to create a half decent substitute for Google Reader but am running into a problem: when I first set up the distribution named in the title I was able to add a bunch of feeds which were displaying as hoped for (most recent posts first).
My assumption was that every time I visited the site, the RSS feeds would update and show any new content. However, the only content displayed is that which was new the day I added the feeds.
How can I address this? I notice that if I add a new feed, all the other feeds update to their newest content, if this helps put my problem in context.
Grateful for any help!
Feeds are fetched during creation and cron run in managing news distribution.
So try setting up cron jobs properly.
Also make sure the feeds that you are fetching has new contents ( by manually inspecting the feed xml).
Related
I want to build a news aggregator application. I have one problem that I don't know how should I take new news articles from news webpages.
I wrote a scraper script in python in which when I run it takes all the news from the source (published today the time of running) and saves them in to a CSV file (I save: URL, Title, Date, Time, Image URL, Category, Content). When I run the script again it checks with the CSV file if it processed the URLs so it does not write duplicate content, only writes the new content. And at the end I want to write these results to my database.
But with this script I have to run it periodically to (lets say every 10 mins) to check if there is new content published.
Is this the write way to accomplish this?
Is there a better way to listen to news sources which can take when the new content is published?
If this is the way to do it how can I set the script to run periodically?
Greatly appreciate your help.
I run the script again it checks with the CSV file if it processed the URLs so it does not write duplicate content, only writes the new content.
You might add to your question:
website address
python code that you've already done
My suggestion for you: Get the most recent URLs from DB (say 100-200, number should be comparable with URLs number on the web page to scrape) and check them against the present URLs on the web page. If new URLs appear, do scrape them.
We are working on survey's using ODK, by creating xls files and transform it into forms, and then we collect data offline.
When employees comeback from the field, they upload data.
What we need now is that they work online from the field, so they can search a specific ID or name, and see the data existing before adding new data.
What I mean is that we need to let them search the database by specific field, and that is not available in odk.
We upload data to ONA then the data are cleaned on the laptop and the searches are done on the laptop too.
Is there a tool that do that process ?
As far as I know, the closest you can get with existing tools is this: https://help.ona.io/faq/filtered-datasets
If you use Enketo (webforms), when the source dataset is updated (via new submissions) the webform will also be automatically updated (may require page refresh and there will be a delay). You could use both offline-capable or online-only webforms with this reference to external data and query it with select_one_from_file, select_multiple_from_file (in XLSForm terminology), or with pulldata, or with regular XPath.
I have a WordPress site that shows posts that it gets from rss using a plugin.
I want to split the "jobs" in 2 different websites because in this configuration is not working properly (high amount of rss sources to be parsed).
I need to set one site to get the posts, and the other to show them on a template; i dont want to have the rss plugin on the site that display the posts.
Is this possible using the same database, same content but different wordpress configuration?
Is there another solution to have 2 different databases and autoupdate one from another?
Any ideea other then these 2?
Thanks
if I understand your problem correctly the solution will depend on what you can do to resolve this issue.
Firstly I think you only need one website the issue you have sounds like your website is displaying items directly from the many RSS feeds which will be slow, instead I would recommend creating a process to import the posts on a daily basis to then get written to your database.
This will probably take the form of a cron job on your server.
However if you don't have access to do this I would recommend creating pages in the back end that will import the rss content when you click import.
I'm new to Drupal, I have a site of version 7.12 that I need to make changes to. A complete relaunch. For that I need to provide a list of all the content type pages that are frequently searched and pages that never got hit so that I can remove them. Is there a way to see that list?
You needed to have the statistics module turned on for drupal to track this information. You can see how to turn it on here:
https://drupal.org/documentation/modules/statistics
Another possible solution is if you had google analytics setup on your site you could see all that information.
Not sure if I am correct or not but i found that my site nerdeky.com which is using a view to show latest nodes on front page is crashing because of the views process of loading all nodes at once.
I have a content type called "Nerdeky Info" which contains more than 500 posts. To show latest 12 posts on frontpage i created a view and applied pager for rest of the posts. It was working good untill i had 100 or 200 posts but now it is slow in loading and sometimes crashes. I have New Relic integrated with my server and i can see that whenever site crashes it reports that the view took most of the time to process php and database.
After some searching on forum i found that my view is currently loading all posts at once and showing nerdeky latest posts afterwards.
Please suggest how to make it light so that only latest posts load first.
Thanks
Bharat sharma