From 6e4eb1f9a659a1875e6582fdd9bb5341ebaa5f14 Mon Sep 17 00:00:00 2001 From: Thomas Levine <_@thomaslevine.com> Date: Wed, 4 Nov 2015 20:00:59 -0500 Subject: [PATCH] use .content in lxml This can also be thought of as a bug in lxml. It just occurred to me that maybe I should fix that. --- docs/scenarios/scrape.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/scenarios/scrape.rst b/docs/scenarios/scrape.rst index 65560d3da..0728d2592 100644 --- a/docs/scenarios/scrape.rst +++ b/docs/scenarios/scrape.rst @@ -38,7 +38,10 @@ parse it using the ``html`` module and save the results in ``tree``: .. code-block:: python page = requests.get('http://econpy.pythonanywhere.com/ex/001.html') - tree = html.fromstring(page.text) + tree = html.fromstring(page.content) + +(We need to use ``page.content`` rather than ``page.text`` because +``html.fromstring`` implicitly expects ``bytes`` as input.) ``tree`` now contains the whole HTML file in a nice tree structure which we can go over two different ways: XPath and CSSSelect. In this example, we