The lxml
module in Python is a powerful and feature-rich library for processing XML and HTML documents. It provides a comprehensive API for parsing, creating, and manipulating XML and HTML data. It is widely used due to its performance, ease of use, and extensive functionality.
1. Installation
To use the lxml
module, you need to install it. You can do so using pip
:
pip install lxml
2. Basic Usage
Here are some basic operations you can perform with the lxml
module:
Parsing XML
from lxml import etree
# Parse XML from a string
xml_string = '''
Content 1
Content 2
'''
root = etree.fromstring(xml_string)
# Print the XML structure
print(etree.tostring(root, pretty_print=True).decode())
Creating XML
from lxml import etree
# Create XML elements
root = etree.Element("root")
child1 = etree.SubElement(root, "child", name="child1")
child1.text = "Content 1"
child2 = etree.SubElement(root, "child", name="child2")
child2.text = "Content 2"
# Convert to string
xml_str = etree.tostring(root, pretty_print=True).decode()
print(xml_str)
Parsing HTML
from lxml import html
# Parse HTML from a string
html_string = '''
Title
This is a paragraph.
'''
tree = html.fromstring(html_string)
# Extract content
title = tree.xpath('//h1/text()')[0]
print("Title:", title)
3. XPath Queries
The lxml
module supports XPath, which allows you to query and navigate XML and HTML documents. Here’s an example:
from lxml import etree
# Parse XML
xml_string = '''
- Item 1
- Item 2
'''
root = etree.fromstring(xml_string)
# XPath query
items = root.xpath('//item/text()')
print("Items:", items)
4. XSLT Transformations
With lxml
, you can also perform XSLT transformations:
from lxml import etree
# Parse XML and XSLT
xml_string = '''
- Item 1
- Item 2
'''
xslt_string = '''
Items
'''
xml_root = etree.fromstring(xml_string)
xslt_root = etree.fromstring(xslt_string)
transform = etree.XSLT(xslt_root)
result = transform(xml_root)
print(result)
5. Conclusion
The lxml
module is a versatile and powerful tool for working with XML and HTML data in Python. Its support for XPath and XSLT, along with its ability to handle large documents efficiently, makes it an excellent choice for many applications involving structured data.