@@ -1215,48 +1215,49 @@ In addition to the examples below, more examples are given in
1215
1215
:ref: `urllib-howto `.
1216
1216
1217
1217
This example gets the python.org main page and displays the first 300 bytes of
1218
- it. ::
1218
+ it::
1219
1219
1220
1220
>>> import urllib.request
1221
1221
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1222
1222
... print(f.read(300))
1223
1223
...
1224
- b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1225
- "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
1226
- xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
1227
- <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
1228
- <title>Python Programming '
1224
+ b'<!doctype html>\n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->\n<!--[if IE 8]> <html class="no-js ie8 lt-ie9">
1229
1225
1230
1226
Note that urlopen returns a bytes object. This is because there is no way
1231
1227
for urlopen to automatically determine the encoding of the byte stream
1232
1228
it receives from the HTTP server. In general, a program will decode
1233
1229
the returned bytes object to string once it determines or guesses
1234
1230
the appropriate encoding.
1235
1231
1236
- The following W3C document, https://www.w3. org/International/O- charset\ , lists
1237
- the various ways in which an (X) HTML or an XML document could have specified its
1232
+ The following HTML spec document, https://html.spec.whatwg. org/# charset, lists
1233
+ the various ways in which an HTML or an XML document could have specified its
1238
1234
encoding information.
1239
1235
1236
+ For additional information, see the W3C document: https://www.w3.org/International/questions/qa-html-encoding-declarations.
1237
+
1240
1238
As the python.org website uses *utf-8 * encoding as specified in its meta tag, we
1241
- will use the same for decoding the bytes object. ::
1239
+ will use the same for decoding the bytes object::
1242
1240
1243
1241
>>> with urllib.request.urlopen('http://www.python.org/') as f:
1244
1242
... print(f.read(100).decode('utf-8'))
1245
1243
...
1246
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1247
- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1244
+ <!doctype html>
1245
+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1246
+ <!-
1248
1247
1249
1248
It is also possible to achieve the same result without using the
1250
- :term: `context manager ` approach. ::
1249
+ :term: `context manager ` approach::
1251
1250
1252
1251
>>> import urllib.request
1253
1252
>>> f = urllib.request.urlopen('http://www.python.org/')
1254
1253
>>> try:
1255
1254
... print(f.read(100).decode('utf-8'))
1256
1255
... finally:
1257
1256
... f.close()
1258
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1259
- "http://www.w3.org/TR/xhtml1/DTD/xhtm
1257
+ ...
1258
+ <!doctype html>
1259
+ <!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->
1260
+ <!--
1260
1261
1261
1262
In the following example, we are sending a data-stream to the stdin of a CGI
1262
1263
and reading the data it returns to us. Note that this example will only work
0 commit comments