Scraping table with multiple pages throwing AttributeError

Scraping table with multiple pages throwing AttributeError - loops

I am trying to loop through multiple pages on this website I am scraping with BS.
pg = soup.find('ul', 'pagination')
current_pg = pg.find('li', 'active')
next_url = current_pg.findNextSibling('li').a.get('href')
Any ideas on how to solve the AttributeError: 'NoneType' object has no attribute 'get'?

To get to the next page, try to parse the href= from <a rel="next"> tag. If the <a> doesn't exist, exit the loop:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://nicn.gov.ng/judgement?page=1"
while True:
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# parse the table and print some data:
df = pd.read_html(str(soup))[0]
print(url)
print(df.tail())
print("-" * 80)
next_link = soup.select_one("a[rel=next]")
if not next_link:
break
url = next_link["href"]
Prints all the 20 pages:
...
--------------------------------------------------------------------------------
https://nicn.gov.ng/judgement?page=19
S/N Suit No Case Title Parties Respondents Justice Judgment Date
95 96 NICN/ABJ/273/2014 CHUKUEZI CHINEDU GOODWILL VS SIRAJ NIGERIA LTD CHUKUEZI CHINEDU GOODWILL SIRAJ NIGERIA LTD HON. JUSTICE E.D. E ISELE view judgment 1970-01-01
96 97 NICN/ABJ/302/2012 BASIL ONYEBUCHI OKORO VS NIGERIA NATIONAL PETROLEUM CORPORATION BASIL ONYEBUCHI OKORO NIGERIA NATIONAL PETROLEUM CORPORATION HON. JUSTICE E.D. E ISELE view judgment 1970-01-01
97 98 NICN/ABJ/340/2013 CHINEKWU NNENNA UDOKWU VS ZENITH BANK PLC CHINEKWU NNENNA UDOKWU ZENITH BANK PLC HON. JUSTICE E.D. E ISELE view judgment 1970-01-01
98 99 NICN/ABJ/202/2013 IBRAHIM MUSLIM AYOADE VS NIGERIA BOTTLING COMPANY LTD IBRAHIM MUSLIM AYOADE NIGERIA BOTTLING COMPANY LTD HON. JUSTICE E.D. E ISELE view judgment 1970-01-01
99 100 NICN/ABJ/246/2013 MR. MAHA ISIAKA ABU VS SKYE BANK (FORMERLY MAINSTREET BANK LIMITED) MR. MAHA ISIAKA ABU SKYE BANK (FORMERLY MAINSTREET BANK LIMITED) HON. JUSTICE E.D. E ISELE view judgment 1970-01-01
--------------------------------------------------------------------------------
https://nicn.gov.ng/judgement?page=20
S/N Suit No Case Title Parties Respondents Justice Judgment Date
26 27 NICN/CA/141/2013 ENGR. PATRICK EDET OQUA VS ATTORNEY-GENERAL, CROSS RIVER STATE ENGR. PATRICK EDET OQUA ATTORNEY-GENERAL, CROSS RIVER STATE HONOURABLE JUSTICE E. N. AGBAKOBA view judgment 1970-01-01
27 28 NICN/LA/243/2013 Emmanuel Fagbamila V University of Lagos Emmanuel Fagbamila University of Lagos Hon. Justice P.O Lifu (JP) view judgment 0000-00-00
28 29 NIC/LA/03/2011 Sunday Olufelo VERSUS Schlumberger Anadrill Nigeria Ltd. . Schlumberger Support Nigeria Ltd.Schlumberger Ltd. Sunday Olufelo Schlumberger Anadrill Nigeria Ltd. . Schlumberger Support Nigeria Ltd.Schlumberger Ltd. Hon. Justice B. B. Kanyip - Presiding Judge Hon. Justice O. A. Obaseki-Osaghae view judgment 0000-00-00
29 30 NICN/LA/291/2012 CAPTAIN SOLOMON J. GAMRA V CHANCHANGI AIRLINES (NIG) LTD Captain Solomon J. Gamra Chanchangi Airlines (Nig) Ltd HON. JUSTICE O.A. OBASEKI-OSAGHAE view judgment 0000-00-00
30 31 NICN/CA/75/2013 MR. MATTHEW EBONG UDO V MR. MATTHEW EBONG UDO NATIONAL EXAMINATIONS COUNCIL (NECO) HON.JUSTICE O.A OBASEKI-OSAGHAE view judgment 0000-00-00
--------------------------------------------------------------------------------

You are getting that error because your selector does not match anything on the page you are trying to scrape.
The website you are trying to scrape contains 20 pages; you can just edit the link you are using for each request.
This goes to all the pages on the website and collects all the links from the tables.
from bs4 import BeautifulSoup
import requests
for x in range(1,21):
link='https://nicn.gov.ng/judgement?page={}'.format(x)
web_info=requests.get(link).text
soup=BeautifulSoup(web_info,'lxml')
#finding the table body on the page
table=soup.find('tbody')
#collecting all the rows
rows=table.find_all('tr')
#now you can examine each row for links
for row in rows:
link=row.find('a').attrs['href']
print(link)

I don't think that you even need bs4 here and even requests but the reason why i used it is to persist the same session.
import requests
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:104.0) Gecko/20100101 Firefox/104.0'
}
def main(url):
with requests.Session() as req:
req.headers.update(headers)
params = {
'page': 1
}
allin = []
while params['page']:
r = req.get(url, params=params)
df = pd.read_html(r.content, attrs={'id': 'mytable'})[0]
if 'next' in r.text:
params['page'] += 1
allin.append(df)
continue
params['page'] = False
final = pd.concat(allin, ignore_index=True)
print(final)
if __name__ == "__main__":
main('https://nicn.gov.ng/judgement')
Output:
S/N Suit No ... Judgment Date
0 1 NICN/ABJ/67/2021 ... view judgment 2021-10-14
1 2 NICN/ABJ/62/2021 ... view judgment 2021-10-07
2 3 NICN/ABJ/304M/2020 ... view judgment 2021-10-05
3 4 NICN/ABJ/240/2018 ... view judgment 2021-07-28
4 5 SUIT NO. NICN/ABJ/185/2018 ... view judgment 2021-07-14
... ... ... ... ... ...
1895 96 NICN/ABJ/273/2014 ... view judgment 1970-01-01
1896 97 NICN/ABJ/302/2012 ... view judgment 1970-01-01
1897 98 NICN/ABJ/340/2013 ... view judgment 1970-01-01
1898 99 NICN/ABJ/202/2013 ... view judgment 1970-01-01
1899 100 NICN/ABJ/246/2013 ... view judgment 1970-01-01
[1900 rows x 8 columns]

Related

Google Data Studio - Running Weighted Average (based on % of total relative to corresponding data)

I'm trying to create a scorecard to show a 'running' weighted average - that is, depending on the user's selected data from a drop down, it can calculate a weighted average score based on % relative to the corresponding data (i.e. the data selected using the dropdown).
Data Studio (within Tables) allows us to add a comparison calculation with 'Percentage of total * Relative to corresponding data' - which is perfect, since when the user changes the drop down, the comparison calculation is updated based on the corresponding data, and we can see the weight of each row.
However, it doesn't appear to be possible to use the comparison calculation for further metric calculations. To calculate the weighted score, I would need to multiply the Score by the comparison calculation (i.e. the % of Total Orders of the corresponding data) and take the sum of the column.
To give an example (please see spreadsheet for example base data):
Month
Country
# Orders
Score
Apr
FR
1,195
67
Apr
DE
276
63
Apr
CH
788
58
Apr
ES
488
69
May
FR
495
62
May
DE
1,894
44
May
CH
1,496
53
May
ES
1,601
53
Jun
FR
286
71
Jun
DE
275
61
Jun
CH
1,041
69
Jun
ES
1,341
60
Jul
FR
660
64
Jul
DE
1,734
55
Average Score (non weighted) = 58.75
However, if I want to weight the scores based on the # Orders (i.e. the % of orders relative to corresponding data - which, for the purpose of this question, is left as the base data):
Month
Country
# Orders
Score
% of Total Orders (to base data)
Individual Weighted Score
Apr
FR
1,195
67
0.07
4.97
Apr
DE
276
63
0.02
1.08
Apr
CH
788
58
0.05
2.84
Apr
ES
488
69
0.03
2.09
May
FR
495
62
0.03
1.90
May
DE
1,894
44
0.12
5.17
May
CH
1,496
53
0.09
4.92
May
ES
1,601
53
0.10
5.27
Jun
FR
286
71
0.02
1.26
Jun
DE
275
61
0.02
1.04
Jun
CH
1,041
69
0.06
4.46
Jun
ES
1,341
60
0.08
4.99
Jul
FR
660
64
0.04
2.62
Jul
DE
1,734
55
0.11
5.92
Jul
CH
1,267
56
0.08
4.40
Jul
ES
1,276
35
0.08
2.77
Weighted Average Score
55.71
Weighted Average Score = (Sum of Individual Weighted Scores) = 55.71
Q1 - how do I calculate, or create a column in Data Studio, for the " Individual Weighted Score" - i.e. how can we use the comparison calculation to make a new metric / field and calculate each row's weighted score?
Q2 - how do I display the result, i.e. the 'Running' Weighted Average Score, as a single scorecard? (the user doesn't need to see the full table)
Please see here for the Data Studio example.
Many thanks in advance,
Arran

The expected output fields can be recreated by first adding a fixed SUM(# Orders) field (titled Total Orders below), where values in all rows are the total of the # Orders field. In Google Data Studio, this currently requires reaggregation through a self blend and a cross join:
1) Blend Fields
Data Source:
Table 1
Table 2
Dimension 1:
Month
Dimension 2:
Country
Metric 1:Aggregation:Source Field:
# OrdersSUM# Orders
Total OrdersSUM# Orders
Metric 1 (Image):
Metric 2:Aggregation:
ScoreSUM
Date Range:
Month (Auto)
Month (Auto)
Image:
2) Join Configuration
Join Description
Table 1 🔗 Table 2
Join Operator:
Cross
Join Condition:
Cross joins don't require any conditions
Image:
3) Calculated Fields
Description
Field 1
Field 2
Name:Formula:Aggregation:Type:
% of Total Orders (to base data)# Orders / Total OrdersSUMNumeric > Percent
Individual Weighted Score(# Orders / Total Orders) * ScoreSUMNumeric > Number
Image:
Publicly editable Google Data Studio report (embedded Google Sheets data source) to elaborate:

Displaying differences with PowerPivot

Based on the following Data:
Location Salesperson Category SalesValue
North Bill Bikes 10
South Bill Bikes 90
South Bill Clothes 250
North Bill Accessories 20
South Bill Accessories 20
South Bob Bikes 200
South Bob Clothess 400
North Bob Accesories 40
I have the following Sales PivotTable in Excel 2016
Bill Bob
Bikes 100 200
Clothes 10 160
Accessories 40 40
I would now like to diplay the difference between Bill and Bob and, importantly, be able to sort the table by difference. I have tried adding the Sales a second time and displaying it as a difference to "Bill". This gives me the correct values but sorts according to the underlying sales value and not the computed difference.
Bill Bob Difference
Bikes 100 200 100
Clothes 10 160 150
Accessories 40 40 0
I am fairly sure I need to use some form of DAX calculation but am having difficulty finding out exactly how. Can anyone give me a pointer?

Create a measure for that calculation:
If Bill and Bob are columns in your table.
Difference = ABS(TableName[Bill] - TableName[Bob])
If Bill and Bob are measures:
Difference = ABS([Bill] - [Bob])
UPDATE: Expression to only calculate difference between Bob and Bill.
Create a measure (in this case DifferenceBillAndBob) and use the following expression.
DifferenceBillAndBob =
ABS (
SUMX ( FILTER ( Sales, Sales[SalesPerson] = "Bob" ), [SalesValue] )
- SUMX ( FILTER ( Sales, Sales[SalesPerson] = "Bill" ), [SalesValue] )
)
It is not tested but should work.
Let me know if this helps.

Trying to sort two columns of an array in R

Assume my data (df) looks something like this:
Rank Student Points Type
3 Liz 60 Junior
1 Sarah 100 Junior
10 John 40 Senior
2 Robert 70 Freshman
13 Jackie 33 Freshman
11 Stevie 35 Senior
I want to sort the data according to the Points, followed by Rank column in descending and ascending order, respectively, so that it looks like this:
Rank Student Points Type
1 Sarah 100 Junior
2 Robert 70 Freshman
3 Liz 60 Junior
10 John 40 Senior
11 Stevie 35 Senior
13 Jackie 33 Freshman
So I did this:
df[order(df[, "Points"], df[, "Rank"]), ]
Resulted in this:
Rank Student Points Type
1 Sarah 100 Junior
10 John 40 Senior
11 Stevie 35 Senior
13 Jackie 33 Freshman
2 Robert 70 Freshman
3 Liz 60 Junior
Question: How do I fix this?
I'm trying to use the column headers because the column length/width may change which can affect my sorting if I use physical locations.
FYI: I've tried so many suggestions and none seems to work:
one, two, three and four...

Try this:
df[order(df$Points,decreasing=T,df$Rank),]
Rank Student Points Type
2 1 Sarah 100 Junior
4 2 Robert 70 Freshman
1 3 Liz 60 Junior
3 10 John 40 Senior
6 11 Stevie 35 Senior
5 13 Jackie 33 Freshman

going back to the basics :) http://www.statmethods.net/management/sorting.html
so, your code should be:
df <- df[order(-Points, Rank),]

Like m0nhawk pointed out in his comment, you probably have the data as strings. String characters are ordered one at a time.
You need to convert them to numeric first. Also, for decreasing order you need the argument decreasing = TRUE.
df[, "Rank"] <- as.numeric(df[, "Rank"])
df[, "Points"] <- as.numeric(df[, "Points"])
df[order(df[, "Points"], decreasing = TRUE, df[, "Rank"]), ]
If the data type is 'factor' this will not work, though. You can try the following:
df <- as.data.frame(df, stringsAsFactors = FALSE)
And then the three lines above will work.

Database-like basic use

I would like to use R for basic database purpose with two data frames: the first data frame is a list of individuals with different features:
data = data.frame("individual"=c("Steve","Bob","Simon","Lisa"),
"feature1"=c(1,2,2,3),
"feature2"=c(3,4,1,NA))
the second data frame has features descritions:
description = data.frame("feature"=c(1,2,3,4,NA),
"label"=c("foot","golf","curling","ski","No answer"))
My goal is to make a third data frame with the individuals names followed by their features descriptions:
Steve foot curling
Bob golf ski
and so on...

sqldf Try this three way join:
library(sqldf)
data[is.na(data)] <- "NA"
description[is.na(description)] <- "NA"
sqldf("select d1.individual, d2.label, d3.label
from data d1
left join description d2 on d1.feature1 = d2.feature
left join description d3 on d1.feature2 = d3.feature"
)
The output is:
individual label label
1 Simon golf foot
2 Steve foot curling
3 Bob golf ski
4 Lisa curling No answer
subscripting
This solution assumes we have run the two <- "NA" lines above.
labels <- with(description, setNames(label, feature))
with(data,
data.frame(individual, labels[feature1], labels[feature2], stringsAsFactors = FALSE)
)
which gives the output:
individual labels.feature1. labels.feature2.
3 Steve foot curling
4 Bob golf ski
1 Simon golf foot
NA Lisa curling No answer
REVISED:
Use left join.
Handle NAs as regular values.
Add second solution.

For this task, match can be used.
cbind(data[1], as.data.frame(lapply(data[-1], function(x)
description$label[match(x, description$feature)])))
individual feature1 feature2
1 Steve foot curling
2 Bob golf ski
3 Simon golf foot
4 Lisa curling No answer

Just for fun a third approach using plyr and reshape2
require(reshape2)
require(plyr)
dcast(join(melt(data, id = "individual", value.name = "feature"), description),
individual ~ variable, value.var = "label")
individual feature1 feature2
1 Bob golf ski
2 Lisa curling No answer
3 Simon golf foot
4 Steve foot curling

How to generate a SQL query to for this data?

I have a SQL Server 2005 table in which I store the book exchanges that take place between two students.
ExchangeID BookID ExchangeDate FromPersName ToPersName
1 23 23.12.2011 John Matt
2 22 15.02.2012 Billy Ken
3 23 27.12.2011 Matt Riddley
5 23 05.03.2012 Riddley Josh
6 22 08.03.2012 Ken Rachel
7 23 19.03.2012 Josh Laura
8 23 15.01.2013 Laura Mike
9 22 17.01.2013 Rachel Stephanie
I want to generate a report for a specified year that looks like this:
Year:2012
BookID PersonName ReceivingDate DeliveryDate
23 Matt 01.01.2012 27.02.2012
23 Riddley 27.02.2012 05.03.2012
23 Josh 05.03.2012 19.03.2012
23 Laura 19.03.2012 31.12.2012
22 Ken 01.01.2012 08.03.2012
22 Rachel 08.03.2012 31.12.2012

This is half solution.You need to change it a litle bit to work. I will not make whole query for Your homework. Put some effort!
http://sqlfiddle.com/#!3/39d73/17

table can join it self
select ex1.BookID,ex1.toPersName,ex1.ExchangeDate as DeliveryDate,Exchanges.ExchangeDate
as ReceivingDate from Exchanges as ex1 inner join Exchanges on
ex1.toPersName=Exchanges.FromPersName

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Scraping table with multiple pages throwing AttributeError - loops

I am trying to loop through multiple pages on this website I am scraping with BS. pg = soup.find('ul', 'pagination') current_pg = pg.find('li', 'active') next_url = current_pg.findNextSibling('li').a.get('href') Any ideas on how to solve the AttributeError: 'NoneType' object has no attribute 'get'?

Related

Google Data Studio - Running Weighted Average (based on % of total relative to corresponding data)

Displaying differences with PowerPivot

Trying to sort two columns of an array in R

Database-like basic use

How to generate a SQL query to for this data?

Categories

Resources