120 likes | 244 Views
This overview discusses the key upgrades in Python 1.6 concerning XML and Unicode support. It highlights the contributions from PythonLabs, focusing on the introduction of the Expat interface for XML parsing, deprecation of older modules like xmllib, and the addition of Unicode functionality directly in the core language. The document emphasizes community involvement in development and the importance of backward compatibility, while also covering the PythonLabs team's commitment to open-source improvements and updated API support for modern standards.
E N D
Python, XML, and PythonLabs Fred L. Drake, Jr. fdrake@beopen.com BeOpen.com
Outline • Python 1.6 and XML • What does Python offer XML users in release 1.6? • PythonLabs at BeOpen.com • What does the formation of PythonLabs mean for Python? BeOpen.com
Python 1.5.* and SGML, XML • sgmllib, htmllib • Just enough SGML to work with HTML-as-deployed … somewhat. • Dispatcher model usable for small projects (SAX-like). • Does not process any DTD information. • xmllib • Simple XML support for ASCII-only element and attribute names. • Namespace support, but difficult to use. • Shared dispatch model from sgmllib, htmllib, so familiar to existing user base. • Not XML 1.0 compliant. • No Unicode support. BeOpen.com
Python 1.6 and XML • Existing modules remain for backward compatibility • But xmllib is deprecated. • Expat interface is included in standard distributions • Can generate UTF-8 or UTF-16. • Installed by default on Windows. • Add-on package for Linux (RPMs, etc.) – probably installed by default on common distributions. • Requires getting & building Expat separately when building from source. • Jack Jansen, Paul Prescod, Andrew Kuchling. • SAX 2 Interface • Contributed by Lars Marius Garshol. BeOpen.com
PyXML Extension Package • Validating parser • 100% Pure Python by Lars Marius Garshol! • Level 1 DOM • Contributed by FourThought, LLC. • Many convenience modules • Build DOM documents from ESIS streams. • ISO 8601 date format support. • SAX handler classes to dump a nicely indented XML document. • Coordinated by Andrew Kuchling • A product of the XML Special Interest Group at python.org. BeOpen.com
Unicode Support • Python 1.6 includes Unicode support in the core! • In source code: u’abc’ • From data: unicode(’raw data from file’, ’iso-8859-5’) • From file objects: <code sample next slide> • Support for over 60 codecs in the standard library. • Uses UTF-16 to avoid excess memory consumption; no support beyond the basic multilingual plane. • Basic string type is still 8-bit characters • Avoids breaking legacy code. BeOpen.com
Unicode in Files >>> import codecs >>> f = codecs.open('test.utf8', 'w', encoding='utf-8') >>> f.write(u'Marc-Andr\xE9 Lemburg') >>> f.close() >>> open('test.utf8').readline() 'Marc-Andr\303\251 Lemburg' >>> codecs.open('test.utf8', encoding='utf-8').readline() u'Marc-Andr\351 Lemburg' BeOpen.com
Unicode and Regular Expressions • New regular expression matching engine • Supports both Unicode and 8-bit strings. • Matches faster than pcre library used in Python 1.5.*. • Regular expression compiler is 100% Pure Python. • Keeps the Perl-compatible syntax for regular expressions. • Written by Fredrik Lundh of Secret Labs, AB. BeOpen.com
PythonLabs at BeOpen.com BeOpen.com
Who is PythonLabs? • The old crew from CNRI: • Guido van Rossum, the creator of Python • Barry Warsaw, maintainer of JPython, MailMan developer • Fred Drake, Python’s Documentation Tzar • Jeremy Hylton, the pragmatic academician • And a familiar voice from the community: • Tim Peters, the universal expert BeOpen.com
Why? • Core development team will devote full time to Python • Core language development & implementation. • Community building. • Extend our efforts to improve development and deployment tools: • IDLE (Python IDE using Tk) • KDevelop integration? • CPAN/CTAN-like repository for 3rd-party packages? • Improve integration facilities • Database API. • Web-related APIs should support the latest standards. • Better visibility in corporate development shops • Our development efforts will be 100% Open Source • All software will have a license that conforms to the Open Source Definition (www.opensource.org). BeOpen.com
Late Breaking News BeOpen.com