Print News Headlines with Python/JSON - arrays

I'm having difficulty properly parsing an array. I realize this is a newb error, so please forgive me.
Example:
import urllib2
import json
import sys
print "Good Morning, Rusty"
i = 0
print "From USA Today: Top Headlines"
f = urllib2.urlopen('http://api.usatoday.com/open/articles/topnews?encoding=json&api_key=98j............v5a93qs')
json_string = f.read()
parsed_json = json.loads(json_string)
for i in parsed_json[0]['stories']['title']:
print json.dump(i)
f.close()
There's one major section called stories, and under it multiple occurrences of description, title, link, pubDate and several other fields.
I simply want to print the dozen or so titles presented by that JSON.

Well, I did more learning and research, and at least got code that would print the top 5 headlines:
Here's what I solved this with:
json_string = f.read()
parsed_json = json.loads(json_string)
for i in range(6):
title = parsed_json['stories'][i]['title']
link = parsed_json['stories'][i]['link']
print title
print link
print "-----------------------------------"

Related

Incrementing over a URL variable

import urllib2
import pandas as pd
from bs4 import BeautifulSoup
x = 0
i = 1
data = []
while (i < 13):
soup = BeautifulSoup(urllib2.urlopen(
'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex=' % i, +str(x)).read(), 'html')
tableStats = soup.find("table", ("class", "playerTableTable tableBody"))
for row in tableStats.findAll('tr')[2:]:
col = row.findAll('td')
try:
name = col[0].a.string.strip()
opp = col[1].a.string.strip()
rec = col[10].string.strip()
yds = col[11].string.strip()
dt = col[12].string.strip()
pts = col[13].string.strip()
data.append([name, opp, rec, yds, dt, pts])
except Exception as e:
pass
df = pd.DataFrame(data=data, columns=[
'PLAYER', 'OPP', 'REC', 'YDS', 'TD', 'PTS'])
df
i += 1
I have been working with a fantasy football program and I am trying to increment data over all weeks so I can create a dataframe for the top 40 players for each week.
I have been able to get it for any week of my choice by manually entering the week number in the PeriodId part of the url, but I am trying to programmatically increment it over each week to make it easier. I have tried using PeriodId='+ I +' and PeriodId=%d but I keep getting various errors about str and int concatenate and bad operands. Any suggestions or tips?
Try removing the comma between %i and str(x) to concatenate the strings and see if that helps.
soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i, +str(x)).read(), 'html')
should be:
soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%d&seasonId=2018&startIndex='%i +str(x)).read(), 'html')
if you have problem concatenating or formatting URL please create variable instead write it one line with BeautifulSoup and urllib2.urlopen.
Use parenthesis to format with multiple value like "before %s is %s" % (1, 0)
url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=%s' % (i, x)
# or
#url = 'http://games.espn.com/ffl/tools/projections?&slotCategoryId=4&scoringPeriodId=%s&seasonId=2018&startIndex=0' % i
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
Make the code sorter will not effect the performance.

python: range not being executed

App executes but the range doesn't. In my CSV file, it only shows the first entry. I've also come across index out of range errors when scraping other fields. Any help would be appreciated. I'm learning.
import requests
import csv
from bs4 import BeautifulSoup
f = csv.writer(open('salons.csv', 'w'))
f.writerow(['Name'])
pages = []
for i in range(0, 10600):
url = 'http://www.aveda.com/locator/get_the_facts.tmpl?SalonID=' + str(i) +' '
pages.append(url)
for item in pages:
page = requests.get(item)
soup = BeautifulSoup(page.text, 'lxml')
salon_name_list = soup.find(class_='getthefacts__store_meta_info--store_phone')
salon_name_list_items = salon_name_list.find_all('li', class_='phone')
for salon_name in salon_name_list_items:
names = salon_name.contents[0]
f.writerow([names])
The way you tried to find phone numbers is not how you should do. Phone numbers are within a tag under class name phone. Try this instead. It will fetch you the phone numbers you are interested in:
import requests ; import csv
from bs4 import BeautifulSoup
outfile = open('salons.csv','w')
writer = csv.writer(outfile)
writer.writerow(['Name'])
for i in range(0, 10600):
url = 'http://www.aveda.com/locator/get_the_facts.tmpl?SalonID={0}'.format(i)
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
for salon_name in soup.select('.phone a'):
names = salon_name.text
print(names)
writer.writerow([names])
outfile.close()
Not sure how you have indented your code. Format it properly in the question. And you may not need two for loops.
import requests
import csv
from bs4 import BeautifulSoup
f = csv.writer(open('salons.csv', 'w'))
f.writerow(['Name'])
for i in range(0, 10600):
url = 'http://www.aveda.com/locator/get_the_facts.tmpl?SalonID=' + str(i) +'/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
salon_name_list = soup.find(class_='getthefacts__store_meta_info--store_phone')
salon_name_list_items = salon_name_list.find_all('li', class_='phone')
for salon_name in salon_name_list_items:
names = salon_name.contents[0]
f.writerow([names])

How do I get matches from a text file and output them in an array?

I'm using a text file with lines of movies. If a user inputs Oz, I want to output all the movies in the file that have the word Oz in it.
This is what I have so far.
puts "Enter the keyword you want to search for: "
keyword = gets
movies_file = File.new("movies.txt", "r")
movies = movies_file.read
movies_list = movies.split(" ")
match_list = []
movies_list.each do |w|
matchObj = w.match(keyword)
if matchObj then
matchlist.push(matchObj.captures[0])
end
end
match_list.each do |title|
puts title
end
Presuming you've got the file organized like this:
Wizard of Oz
Battlefield Earth
Twilight
Ozymandias
Then you can read it in this way:
lines = File.readlines('movies.txt').map(&:chomp)
Then to find matching lines:
matches = lines.grep(phrase)
There's no need for all the each stuff. Also the then on an if is almost never put in there, it's just useless decoration.

Find string in log files and return extra characters

How can I get Python to loop through a directory and find a specific string in each file located within that directory, then output a summary of what it found?
I want to search the long files for the following string:
FIRMWARE_VERSION = "2.15"
Only, the firmware version can be different in each file. So I want the log file to report back with whatever version it finds.
import glob
import os
print("The following list contains the firmware version of each server.\n")
os.chdir( "LOGS\\" )
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + " = ???)
I was thinking I could use something like the following to return the extra characters but it's not working.
file[:+5]
I want the output to look something like this:
server1.web.com = FIRMWARE_VERSION = "2.16"
server2.web.com = FIRMWARE_VERSION = "3.01"
server3.web.com = FIRMWARE_VERSION = "1.26"
server4.web.com = FIRMWARE_VERSION = "4.1"
server5.web.com = FIRMWARE_VERSION = "3.50"
Any suggestions on how I can do this?
You can use regex for grub the text :
import re
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + '='+ re.search(r'FIRMWARE_VERSION ="([\d.]+)"',contents).group(1))
In this case re.search will do the job! with searching the file content based on the following pattern :
r'FIRMWARE_VERSION ="([\d.]+)"'
that find a float number between two double quote!also you can use the following that match anything right after FIRMWARE_VERSIONbetween two double quote.
r'FIRMWARE_VERSION =(".*")'

insert URLs in kunamed.bst

I am using LyX and would like to add URLs to my #misc entries in my bibtex bibliography. I am using the kunamed.bst style for my bibliography. My bibtex entries looks like this
#misc{RFA2011,
author = {RFA},
booktitle = {Statistics},
title = {{Renewable Fuels Association}},
url = {http://www.ethanolrfa.org/pages/statistics},
urldate = {Jan 13th 2014},
year = {2014}
}
I have tried to change the FUNCTION {misc} in the kunamed.bst file like from this:
FUNCTION {misc}
{ output.bibitem
format.authors output
author format.key output
output.year.check
format.title output
format.date output
new.block
howpublished output
new.block
note output
fin.entry
}
to this
FUNCTION {misc}
{ output.bibitem
format.authors output
author format.key output
output.year.check
format.title output
format.date output
format.url output % <------
new.block
howpublished output
new.block
note output
fin.entry
}
No change is happening in my bibliography. Any ideas how to fix this?
I solved the problem myself by changing the way the url is written in the bibtex entry
#misc{RFA2011,
author = {RFA},
booktitle = {Statistics},
title = {{Renewable Fuels Association}},
howpublished = "\url{http://www.ethanolrfa.org/pages/statistics}", % <---
year = {2014},
note = "[Accessed online: 13-Jan-2014]"

Resources