Introduction to Data Science

Web Scraping in Python

Author

Joanna Bieri
DATA101

Important Information

Announcements

In NEXT WEEK - Data Ethics This week you should be reading your book or articles.

Day 12 Assignment - same drill.

  1. Make sure Pull any new content from the class repo - then Copy it over into your working diretory.
  2. Open the file Day##-HW.ipynb and start doing the problems.
    • You can do these problems as you follow along with the lecture notes and video.
  3. Get as far as you can before class.
  4. Submit what you have so far Commit and Push to Git.
  5. Take the daily check in quiz on Canvas.
  6. Come to class with lots of questions!

——————————

Web Scraping Ethical Issues

There are some things to be aware of before you start scraping data from the web.

  • Some data is private or protected. Just because you have access to a websites data doesn’t mean you are allowed to scrape it. For example, when you log into Facebook or another social media site, you are granted special access to data about your connected people. It is unethical to use that access to scrape their private data!

  • Some websites have rules against scraping and will cut of service to users who are clearly scraping data. How do they know? Webscrapers access the website very differently that regular users. If they site has a policy about scraping data then you should follow it and/or content them about getting the data if you have a true academic interest in the data.

  • The line between web scraping and plagiarism can be very blurry. Make sure that you are citing where your data comes from AND not just reproducing the data exactly. Always citing the source of your data and make sure you are doing something new with it.

  • Ethics are different depending on if you are using the data for a personal project (eg. you just want to check scores for your favorite team daily and print the stuff you care about) vs if you are using the project for your business or website (eg. publishing information to drive clicks to your site/video/account or making money from the data you collect). In the later case it is EXTRA important to respect the original owner of the data. Drive web traffic back to their site, check with them about using their data, etc.

The Ethical Scraper (from https://towardsdatascience.com/ethics-in-web-scraping-b96b18136f01):

I, the web scraper will live by the following principles:

  • If you have a public API that provides the data I’m looking for, I’ll use it and avoid scraping all together.
  • I will always provide a User Agent string that makes my intentions clear and provides a way for you to contact me with questions or concerns.
  • I will request data at a reasonable rate. I will strive to never be confused for a DDoS attack.
  • I will only save the data I absolutely need from your page. If all I need it OpenGraph meta-data, that’s all I’ll keep.
  • I will respect any content I do keep. I’ll never pass it off as my own.
  • I will look for ways to return value to you. Maybe I can drive some (real) traffic to your site or credit you in an article or post.
  • I will respond in a timely fashion to your outreach and work with you towards a resolution.
  • I will scrape for the purpose of creating new value from the data, not to duplicate it.

I am assuming all the examples below are for personal use.

—————–

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.defaule = 'colab'

from itables import show

# This stops a few warning messages from showing
pd.options.mode.chained_assignment = None 
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

Using pandas to get table data.

We have already briefly seen this in action!

If the data on the website you are interested in is already writen in a table then Pandas can grab that data and save it to a data frame.

Here is an example of how you could get data about a sports team.

I am a fan of the Las Vegas Aces Basketball team and I might want to know more about the stats of their players. Check out the website:

basketball-reference.com/wnba/teams/SAS/players.html

You can see that the data here is already in a table!

my_website = "https://www.basketball-reference.com/wnba/teams/SAS/players.html"
DF = pd.read_html(my_website)
# Lets see what Pandas got for us:
ACES = DF[0].copy()
ACES
Rk Player From To Yrs Unnamed: 5 G MP FG FGA ... PTS Unnamed: 22 FG% 3P% FT% Unnamed: 26 MP.1 PTS.1 TRB.1 AST.1
0 1 Danielle Adams 2011 2015 5 NaN 155 3247 624 1472 ... 1771 NaN 0.424 0.328 0.754 NaN 20.9 11.4 4.3 0.9
1 2 Elisa Aguilar 2002 2002 1 NaN 28 141 14 33 ... 43 NaN 0.424 0.524 0.571 NaN 5.0 1.5 0.4 0.6
2 3 Kayla Alexander 2013 2017 5 NaN 154 2038 278 555 ... 692 NaN 0.501 NaN 0.764 NaN 13.2 4.5 3.1 0.3
3 4 Lindsay Allen 2018 2020 2 NaN 45 642 56 139 ... 144 NaN 0.403 0.212 0.735 NaN 14.3 3.2 1.2 2.7
4 5 Chantelle Anderson 2005 2007 3 NaN 68 1168 152 310 ... 384 NaN 0.490 NaN 0.777 NaN 17.2 5.6 2.9 0.4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
146 147 Nevriye Yilmaz 2004 2004 1 NaN 7 77 6 24 ... 19 NaN 0.250 0.143 1.000 NaN 11.0 2.7 1.4 0.3
147 148 Jackie Young 2019 2024 6 NaN 199 5944 965 2075 ... 2685 NaN 0.465 0.386 0.852 NaN 29.9 13.5 4.0 4.0
148 149 Sophia Young-Malcolm 2006 2015 9 NaN 301 9258 1659 3545 ... 4300 NaN 0.468 0.223 0.718 NaN 30.8 14.3 6.0 1.8
149 150 Tamera Young 2018 2019 2 NaN 67 1509 197 495 ... 507 NaN 0.398 0.310 0.680 NaN 22.5 7.6 4.4 2.4
150 151 Shanna Zolman 2006 2009 3 NaN 87 1273 225 567 ... 635 NaN 0.397 0.402 0.811 NaN 14.6 7.3 1.1 0.8

151 rows × 31 columns

Ask some questions:

  1. Do I know what every column means and what kind of data is in that column?

  2. Are there certain columns I am interested in?

  3. What are all these “Unnamed” columns and do I need them?

  4. Can I come up with some Data focused questions?

ACES.shape
(151, 31)
ACES.columns
Index(['Rk', 'Player', 'From', 'To', 'Yrs', 'Unnamed: 5', 'G', 'MP', 'FG',
       'FGA', '3P', '3PA', 'FT', 'FTA', 'ORB', 'TRB', 'AST', 'STL', 'BLK',
       'TOV', 'PF', 'PTS', 'Unnamed: 22', 'FG%', '3P%', 'FT%', 'Unnamed: 26',
       'MP.1', 'PTS.1', 'TRB.1', 'AST.1'],
      dtype='object')
columns = ['Unnamed: 5', 'Unnamed: 22','Unnamed: 26']
ACES[columns]
Unnamed: 5 Unnamed: 22 Unnamed: 26
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
... ... ... ...
146 NaN NaN NaN
147 NaN NaN NaN
148 NaN NaN NaN
149 NaN NaN NaN
150 NaN NaN NaN

151 rows × 3 columns

Dealing with NaNs

So there is nothing in the unnamed columns! Are all of the rows are NaN?

There are some really nice commands for dealing with NaN’s in your data:

  • First NaNs are a strange data type (np.nan) they are considered a float - like a decimal. In most raw data sets NaN means no data was given for that observation and variable, but be careful NaN can also happen if you do a calculation and accidentally divide by zero.

  • .isna() creates a mask for whether or not there is a NaN in each row of the data.

  • .fillna() will replace NaN in your data set with whatever you put inside the parenthesis

  • .dropna() will drop all rows that contain NaN - becareful with this command. You want to keep as much data as possible and .dropna() might delete too much!

ACES.dropna(inplace=True)
ACES
Rk Player From To Yrs Unnamed: 5 G MP FG FGA ... PTS Unnamed: 22 FG% 3P% FT% Unnamed: 26 MP.1 PTS.1 TRB.1 AST.1

0 rows × 31 columns

Oh No! We deleted our data!!!!

We got rid of any row that contains NaN, but if we look at our unnamed columns, they are all NaNs!

Let’s reload the data and try again

# Get the data again
ACES = DF[0].copy()
ACES[columns]
Unnamed: 5 Unnamed: 22 Unnamed: 26
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
... ... ... ...
146 NaN NaN NaN
147 NaN NaN NaN
148 NaN NaN NaN
149 NaN NaN NaN
150 NaN NaN NaN

151 rows × 3 columns

ACES[columns].isna().sum()
Unnamed: 5     151
Unnamed: 22    151
Unnamed: 26    151
dtype: int64

Now we see that these columns are all NaN when we add them up, so we can just ignore the columns in our future data analysis.

This was an example of when Pandas worked well!

Sometimes you get errors!

  • You can try installing the packages that python asks for, but some websites use pretty advanced coding.
  • When you see a 404 forbidden error, this means that the website is trying to stop you from scraping and you would need to use even more advanced techniques!
website = 'https://www.scrapethissite.com/pages/simple/'
df = pd.read_html(website)
HTTPError: HTTP Error 403: Forbidden

Using Beautiful Soup to get HTML code

The code below will work well for static websites - aka sites that don’t use fancy things to actively load content.

Websites are built using html code. That code tells the web browser (FireFox, Chrome, etc) what to display. Websites can be very simple (just html) to much more complicated (java script +). When you load a website you can always see the source code:

  • Right Click - view page source

This is what beautiful soup downloads. For static (simple) sites this code is immediately available. More complicated sites might require Python to open the webpage, let the content render, and then download the code.

How to get data from static sites:

You should already have the packages bs4 and requests but if you get an error try running:

!conda install -y bs4
!conda install -y requests
import requests
from bs4 import BeautifulSoup
website = 'https://www.scrapethissite.com/pages/simple/'
raw_code = requests.get(website)
html_doc = raw_code.text
soup = BeautifulSoup(html_doc, 'html.parser')

Lets see what is in soup

Uncomment this line and run the cell. You will see a TON of text!

#soup

This is HTML code

    <!DOCTYPE html>
    
    <html lang="en">
    <head>
    <meta charset="utf-8"/>
    <title>Countries of the World: A Simple Example | Scrape This Site | A public sandbox for learning web scraping</title>
    <link href="/static/images/scraper-icon.png" rel="icon" type="image/png"/>
    <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
    <meta content="A single page that lists information about all the countries in the world. Good for those just get started with web scraping." name="description"/>
    <link crossorigin="anonymous" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css" integrity="sha256-MfvZlkHCEqatNoGiOXveE8FIwMzZg4W85qfrfIFBfYc= sha512-dTfge/zgoMYpP7QbHy4gWMEGsbsdZeCXz7irItjcC3sPUFtf0kuFbDz/ixG7ArTxmDjLXDmezHubeNikyKGVyQ==" rel="stylesheet"/>
    <link href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css"/>
    <link href="/static/css/styles.css" rel="stylesheet" type="text/css"/>
    <meta content="noindex" name="robots"/>
    <link href="https://lipis.github.io/flag-icon-css/css/flag-icon.css" rel="stylesheet"/>
    </head>
    <body>
    <nav id="site-nav">
    <div class="container">
    <div class="col-md-12">
    <ul class="nav nav-tabs">
    <li id="nav-homepage">
    <a class="nav-link hidden-sm hidden-xs" href="/">
    <img id="nav-logo" src="/static/images/scraper-icon.png"/>
                                    Scrape This Site
                                </a>
    </li>
    <li id="nav-sandbox">
    <a class="nav-link" href="/pages/">
    <i class="glyphicon glyphicon-console hidden-sm hidden-xs"></i>
                                    Sandbox
                                </a>

WHAT A MESS!

The information in soup is ALL of the code and unless you are awesome at reading HTML, this is indecipherable. We need to be able to find specific parts of this to extract the data.

Extracting data from HTML:

We will use the soup.find_all() function.

Here is the simplified function signature:

soup.find_all(name=None,attrs={})

You can type soup.find_all? and run it to see all the information about additional arguments and advanced processes.

Here is how we will mostly use it, but there are much more advanced things you can do:

soup.find_all( <type of section>, <info> )

The .find_all() function searches through the information in soup to match and return only sections that match the info. Here are some important types you might search for:

  • ‘h2’ - this is a heading

  • ‘div’ - this divides a block of information

  • ‘span’ - this divides inline information

  • ‘a’ - this specifies a hyperlink

  • ‘li’ - this is a list item

  • class_= - many things have the class label (notice the underscore!)

  • string= - you can also search by strings.

Using Developer tools:

To figure out what data to extract I suggest you use developer tools on the website to find what you need. Navigate to the website:

Scrape This Site - Website

I really like Brave Browser or Google Chrome for this, but most browsers with have More Tools/Developer Tools where you can see the code.

Using developer tools you can decide what you are looking for. Let’s say I want the country information, here is an examle of the code that contains the country information:

                <div class="col-md-4 country">
                    <h3 class="country-name">
                        <i class="flag-icon flag-icon-ad"></i>
                        Andorra
                    </h3>
                    <div class="country-info">
                        <strong>Capital:</strong> <span class="country-capital">Andorra la Vella</span><br>
                        <strong>Population:</strong> <span class="country-population">84000</span><br>
                        <strong>Area (km<sup>2</sup>):</strong> <span class="country-area">468.0</span><br>
                    </div>
                </div>

So I can use soup.find_all() to find pieces of this information!

Search for all the country names

The names of the country are inside

    <h3 class="country-name">

So lets search for this:

result = soup.find_all('h3',class_="country-name")
#print(result)

This is still a mess

[<h3 class="country-name">
<i class="flag-icon flag-icon-ad"></i>
                            Andorra
                        </h3>, <h3 class="country-name">
<i class="flag-icon flag-icon-ae"></i>
                            United Arab Emirates
                        </h3>, <h3 class="country-name">
<i class="flag-icon flag-icon-af"></i>
                            Afghanistan
                        </h3>, <h3 class="country-name">
<i class="flag-icon flag-icon-ag"></i>
                            Antigua and Barbuda
                        </h3>, <h3 class="country-name">
<i class="flag-icon flag-icon-ai"></i>
                            Anguilla
                        </h3>, <h3 class="country-name">
<i class="flag-icon flag-icon-al"></i>
                            Albania

We can see the country names but they are surrounded by other junk. Here is how we will handle this.

  1. We will start and EMPTY data frame using

    DF = pd.DataFrame()
  2. We will add our soup.find_all results as a column of the data frame

  3. We will fix the data to strip off all of the unneeded text.

How to get the text

What this returns is a list of all the text that is inside a link block of code. If we just want to look at the text we can!

result[0]
<h3 class="country-name">
<i class="flag-icon flag-icon-ad"></i>
                            Andorra
                        </h3>
result[0].text
'\n\n                            Andorra\n                        '
result[0].text.lstrip().rstrip()
'Andorra'

Put this into a data frame

Now that we see what wee need to do to get the data in the form we want it, we can use a data frame and a lambda to get the data in a nice format.

DF = pd.DataFrame()
DF['country']=result
DF['country'] =DF['country'].apply(lambda x: x.text.rstrip().lstrip())
DF
country
0 Andorra
1 United Arab Emirates
2 Afghanistan
3 Antigua and Barbuda
4 Anguilla
... ...
245 Yemen
246 Mayotte
247 South Africa
248 Zambia
249 Zimbabwe

250 rows × 1 columns

Lets try again and this time add the country capital

If we look at the code we see the country capital is inside code that looks like this:

<strong>Capital:</strong> <span class="country-capital">Andorra la Vella</span><br>

so lets search for a span where class is “country-capital”

result = soup.find_all('span',class_="country-capital")
result[0]
<span class="country-capital">Andorra la Vella</span>
result[0].text
'Andorra la Vella'
DF['capital']=result
DF['capital'] = DF['capital'].apply(lambda x: x.text)
DF
country capital
0 Andorra Andorra la Vella
1 United Arab Emirates Abu Dhabi
2 Afghanistan Kabul
3 Antigua and Barbuda St. John's
4 Anguilla The Valley
... ... ...
245 Yemen Sanaa
246 Mayotte Mamoudzou
247 South Africa Pretoria
248 Zambia Lusaka
249 Zimbabwe Harare

250 rows × 2 columns

Your turn to try this!

Q Now it’s your turn. See if you can write code that gets the population and area information into the data frame. See if you can make your example match what I get below, including having the correct data types. Population should be an int and area should be a float.

Your goal is to get a data frame that looks like this:

country capital population area
0 Andorra Andorra la Vella 84000 468.0
1 United Arab Emirates Abu Dhabi 4975593 82880.0
2 Afghanistan Kabul 29121286 647500.0
3 Antigua and Barbuda St. John's 86754 443.0
4 Anguilla The Valley 13254 102.0
... ... ... ... ...
245 Yemen Sanaa 23495361 527970.0
246 Mayotte Mamoudzou 159042 374.0
247 South Africa Pretoria 49000000 1219912.0
248 Zambia Lusaka 13460305 752614.0
249 Zimbabwe Harare 11651858 390580.0

250 rows × 4 columns

DF.dtypes
country        object
capital        object
population      int64
area          float64
dtype: object

Where can you get more practice

Here is a website dedicated to allowing students to practice webscraping:

www.scrapethissite.com

There are “sandbox” websites that are indended for scraping. The only thing the site asks is:

Please be Well-Behaved

Just like any site you’d scrape out in the wild wild west (www), please be mindful of other users trying to access the site. From a technical standpoint, you must observe the following rules:

  • Clients may only make a maximum of one request per second
  • Clients must send an identifying user agent
  • Clients must respect this site’s robots.txt file

Any client that violates the rules above or otherwise tries to interfere with the site’s operation will be subject to a temporary or permanent ban.

Be a good web scraping citizen.

You try scraping (with help)

We are going to scrape the site

www.scrapethissite.com

to get the list of all links. Open the website and look at the developer tools. You should see something like:

<div class="page">
    <h3 class="page-title">
        <a href="/pages/simple/">Countries of the World: A Simple Example</a>
    </h3>
    <p class="lead session-desc">
        A single page that lists information about all the countries in the world. Good for those just get started with web scraping.
    </p>
    <hr>
</div>

for each link the the website.

Here is our goal: Make a pandas data frame that contains three columns:

  • “site_name” - which contains just the words of the link
  • “link” - which contains just the website part of the link
  • “description” - which contains the words below the link

When looking for links you can use

result = soup.find_all('a')
result[0].text # Get the text associated with the link
result[0].get('href') # Get the link location

Exercise 1:

See if you can do this without looking at my code in the notes! What would your plan have to be?

  1. Get the soup and find_all to get links (notice the first few entries are not helpful)
  2. make an empty data frame
  3. add a new column to contain the link text
  4. fix that text so it looks nice
  5. add a new column to contain the actual link
  6. fix that text so it looks nice
  7. do another find all to find the words
  8. add a new column to contain the words
  9. fix the text so it looks nice

The answer (below)

but seriously try this on your own

before

you

look at

the

ANSWER

# Get the soup
website = 'https://www.scrapethissite.com/pages/'
raw_code = requests.get(website)
html_doc = raw_code.text
soup = BeautifulSoup(html_doc, 'html.parser')
# Search the soup for links
result = soup.find_all('a')
# Look at what we end up with and fix it till it looks good
result[0].text.lstrip().rstrip()
'Scrape This Site'
# The first few are from the top of the page not the main content
result[5].text.lstrip().rstrip()
'Countries of the World: A Simple Example'

Notice:

I don’t get the links I want until I get the the fifth result…. I might have to remove these first few rows from my data eventually!

# Create a data frame and add the site names
DF = pd.DataFrame()
DF['site_name'] = result
DF['site_name'] = DF['site_name'].apply(lambda x: x.text.lstrip().rstrip())
DF
site_name
0 Scrape This Site
1 Sandbox
2 Lessons
3 FAQ
4 Login
5 Countries of the World: A Simple Example
6 Hockey Teams: Forms, Searching and Pagination
7 Oscar Winning Films: AJAX and Javascript
8 Turtles All the Way Down: Frames & iFrames
9 Advanced Topics: Real World Challenges You'll ...
# Look at what we ended up with but this time get the links
result[0].get('href')
'/'
result[5].get('href')
'/pages/simple/'
# Add these to the data frame
DF['link'] = result
DF['link'] = DF['link'].apply(lambda x: x.get('href'))
DF
site_name link
0 Scrape This Site /
1 Sandbox /pages/
2 Lessons /lessons/
3 FAQ /faq/
4 Login /login/
5 Countries of the World: A Simple Example /pages/simple/
6 Hockey Teams: Forms, Searching and Pagination /pages/forms/
7 Oscar Winning Films: AJAX and Javascript /pages/ajax-javascript/
8 Turtles All the Way Down: Frames & iFrames /pages/frames/
9 Advanced Topics: Real World Challenges You'll ... /pages/advanced/
# EXTRA - lets add the base website info
base_website = 'https://www.scrapethissite.com'
DF['link'] = DF['link'].apply(lambda x: base_website+x)
DF
site_name link
0 Scrape This Site https://www.scrapethissite.com/
1 Sandbox https://www.scrapethissite.com/pages/
2 Lessons https://www.scrapethissite.com/lessons/
3 FAQ https://www.scrapethissite.com/faq/
4 Login https://www.scrapethissite.com/login/
5 Countries of the World: A Simple Example https://www.scrapethissite.com/pages/simple/
6 Hockey Teams: Forms, Searching and Pagination https://www.scrapethissite.com/pages/forms/
7 Oscar Winning Films: AJAX and Javascript https://www.scrapethissite.com/pages/ajax-java...
8 Turtles All the Way Down: Frames & iFrames https://www.scrapethissite.com/pages/frames/
9 Advanced Topics: Real World Challenges You'll ... https://www.scrapethissite.com/pages/advanced/
# Remove the top few links
DF = DF.drop([0,1,2,3,4])
DF
site_name link
5 Countries of the World: A Simple Example https://www.scrapethissite.com/pages/simple/
6 Hockey Teams: Forms, Searching and Pagination https://www.scrapethissite.com/pages/forms/
7 Oscar Winning Films: AJAX and Javascript https://www.scrapethissite.com/pages/ajax-java...
8 Turtles All the Way Down: Frames & iFrames https://www.scrapethissite.com/pages/frames/
9 Advanced Topics: Real World Challenges You'll ... https://www.scrapethissite.com/pages/advanced/
# Now search the soup for the words
# They are inside p class="lead session-desc"
result = soup.find_all('p',class_="lead session-desc")
result[0].text.rstrip().lstrip()
'A single page that lists information about all the countries in the world. Good for those just get started with web scraping.'
DF['description'] = result
DF['description'] = DF['description'].apply(lambda x: x.text.rstrip().lstrip())
DF
site_name link description
5 Countries of the World: A Simple Example https://www.scrapethissite.com/pages/simple/ A single page that lists information about all...
6 Hockey Teams: Forms, Searching and Pagination https://www.scrapethissite.com/pages/forms/ Browse through a database of NHL team stats si...
7 Oscar Winning Films: AJAX and Javascript https://www.scrapethissite.com/pages/ajax-java... Click through a bunch of great films. Learn ho...
8 Turtles All the Way Down: Frames & iFrames https://www.scrapethissite.com/pages/frames/ Some older sites might still use frames to bre...
9 Advanced Topics: Real World Challenges You'll ... https://www.scrapethissite.com/pages/advanced/ Scraping real websites, you're likely run into...

Challenge Problem

Here is another website to scrape. See if you can create a data frame that looks like the one below. Notice that you can only scrape the first page.

If you want to try scraping the other pages you have to notice how the website updates its address for each page. Then write a for loop to loop through how ever many pages you want to scrape. Do the same set of operations for each page keep adding data to your data frame.

Make a histogram of your final data.

website='https://books.toscrape.com/index.html'
raw_code = requests.get(website)
html_doc = raw_code.text
soup = BeautifulSoup(html_doc, 'html.parser')

Try to scrape the name, the link to the book, and the prices! I decided to put the name and link information into a single column and then break that apart

names_links links names price
0 [[A Light in the ...]] catalogue/a-light-in-the-attic_1000/index.html A Light in the ... 51.77
1 [[Tipping the Velvet]] catalogue/tipping-the-velvet_999/index.html Tipping the Velvet 53.74
2 [[Soumission]] catalogue/soumission_998/index.html Soumission 50.10
3 [[Sharp Objects]] catalogue/sharp-objects_997/index.html Sharp Objects 47.82
4 [[Sapiens: A Brief History ...]] catalogue/sapiens-a-brief-history-of-humankind... Sapiens: A Brief History ... 54.23
5 [[The Requiem Red]] catalogue/the-requiem-red_995/index.html The Requiem Red 22.65
6 [[The Dirty Little Secrets ...]] catalogue/the-dirty-little-secrets-of-getting-... The Dirty Little Secrets ... 33.34
7 [[The Coming Woman: A ...]] catalogue/the-coming-woman-a-novel-based-on-th... The Coming Woman: A ... 17.93
8 [[The Boys in the ...]] catalogue/the-boys-in-the-boat-nine-americans-... The Boys in the ... 22.60
9 [[The Black Maria]] catalogue/the-black-maria_991/index.html The Black Maria 52.15
10 [[Starving Hearts (Triangular Trade ...]] catalogue/starving-hearts-triangular-trade-tri... Starving Hearts (Triangular Trade ... 13.99
11 [[Shakespeare's Sonnets]] catalogue/shakespeares-sonnets_989/index.html Shakespeare's Sonnets 20.66
12 [[Set Me Free]] catalogue/set-me-free_988/index.html Set Me Free 17.46
13 [[Scott Pilgrim's Precious Little ...]] catalogue/scott-pilgrims-precious-little-life-... Scott Pilgrim's Precious Little ... 52.29
14 [[Rip it Up and ...]] catalogue/rip-it-up-and-start-again_986/index.... Rip it Up and ... 35.02
15 [[Our Band Could Be ...]] catalogue/our-band-could-be-your-life-scenes-f... Our Band Could Be ... 57.25
16 [[Olio]] catalogue/olio_984/index.html Olio 23.88
17 [[Mesaerion: The Best Science ...]] catalogue/mesaerion-the-best-science-fiction-s... Mesaerion: The Best Science ... 37.59
18 [[Libertarianism for Beginners]] catalogue/libertarianism-for-beginners_982/ind... Libertarianism for Beginners 51.33
19 [[It's Only the Himalayas]] catalogue/its-only-the-himalayas_981/index.html It's Only the Himalayas 45.17
fig = px.histogram(DF_plot,x='price',color='names')

fig.show()