The Python Book
 
strip_html html
20141130

Strip HTML tags from a text.

6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()


txt='''
<span class="mw-headline" id="The_K.C3.B6ln_concert">The Köln concert </span>
<span class="mw-editsection"><span class="mw-editsection-bracke t">
[</span><a href="/w/index.php?title=The_K%C3%B6ln_Concert&amp;action=edit&amp;section=1" 
title="Edit section: The Köln concert">edit</a><span class="mw-editsection-bracket">]</span>
</span></h2>
<p>The concert was organized by 17-year-old 
Vera Brandes, then Germany ’s youngest concert promoter.<sup id="cite_ref-5" class="reference">
<a href="#cite_note-5"><span>[</span>5<span>]</span></a></sup> At Jarrett's request, Brandes 
had selected a <a href="/wiki/B%C3%B6sendorfer" title="Bösendorfer">Bösendorfer</a> 
290 Imperial concert grand piano for the performance. 
'''

print strip_tags(txt)

Output:

The Köln concert 

[edit]

The concert was organized by 17-year-old 
Vera Brandes, then Germany ’s youngest concert promoter.
[5] At Jarrett's request, Brandes 
had selected a Bösendorfer 
290 Imperial concert grand piano for the performance. 

As found on : stackoverflow

 
Notes by Willem Moors. Generated on momo:/home/willem/sync/20151223_datamungingninja/pythonbook at 2019-07-31 19:22