Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Ideally the data we wish to work on can be downloaded in an easy to use format. Otherwise when we want only a small subset of a very big dataset, or the data is being constantly updated, hopefully the owner will provide an application programming interface (API) to automate the collection of the relevant data. However quite often the data cannot be downloaded and there is no API, but the data is publicly available, just dispersed across a website. When it would be too tedious and time consuming to navigate page by page to collect the data manually; we can use Selenium Webdriver and Beautiful Soup to automate navigating across the website and collecting of the relevant data. In this code clinic, I will go through the best practices (and what not to do!) when web scraping; using Selenium Webdriver to navigate around a website and then using Beautiful Soup to extract the data from the HTML.

The following tools will be used in this code clinic:
Python3 - https://www.python.org/
Selenium - https://selenium-python.readthedocs.io/
Beautiful Soup -https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Firefox (or which web browser you use, you will need gecko drivers) - https://selenium-python.readthedocs.io/installation.html#drivers

To register, please click here:

https://oxford.onlinesurveys.ac.uk/python-code-clinic-29-april