goebel consult proudly announces
bridgekeeper
============
0.0.1
A Perl to Python source code converter
http://www.crazy-compilers.com/
(Nürnberg/2002-04-01) While 2001 brought us Parott, pyperl and
pyperlish, 2002 brings to you 'bridgekeeper', a Perl to Python source
code converter.
What 'bridgekeeper' does
------------------------
'bridgekeeper' help converting Perl code to Python source. The
quallity of the output Python source depents on the quallity of the
input code.
If there are constructs within the Perl code which are not possible in
Python, you will get warnings. Eg. when using a name for both a hash
and a scalar within the same scope (%foo, $foo).
Converting Perl to Python will be a iterative process, see the
TUTORIAL for more information about this.
'bridgekeeper' consists of
* a Perl compiler back-end emitting Python-like source and
* a runtime Python package which tries to emulate some perl build-in
functions.
The name was inspired be the bridgekeeper-scene in Monty Pythons
'Search for the Holy Grail'.
Features
--------
* Already converts a lot of Perl code-constructs into Python source:
loops, special variables, function calls, lists, hashes, methods,
etc.
* Whenever there is no equivalent Python function for a Perl function,
the function is emulated using the included Python library.
* Includes a TUTORIAL 'How to convert Perl Code to Python Source'
* Warns about variables with same name but different type within a
scope, eg. when using a name for both a hash and a scalar (%foo,
$foo). This included scalars, hashes, lists, arrays and (hopefully)
filehandles.
* Warns about mixing variables with same name but different Perl
scoping within a Python scope, eg. using (global) $foo and my $foo
in one scope.
* Strips statements like 'my ($foo, %bar);'; this is: my and local
declarations without assignment.
* Special variables are renamed to their equivalent within the
included Python library.
* 'here-documnets' are supported.
Limitations
-----------
* Does not support 'use', 'BEGIN', 'END', etc. This is due how Perl
handles these expressions: they get executed while parsing. See the
Tutorial for solving this problem.
* Does not handle 'bless', 'packages', etc.
* References may or may not be resolved correctly.
* Does not convert things like 'my ($self, arg1, arg2) = @_;' into
a pythonic argument-lists. (Hey, this would be a great feature!)
* In some rare cases Perl functions are converted to the
nearly-equivalent Python function instead of using a emulated
function. This is done for functions which return a value which is
probly not often used.
* The Python libary included contains all required functions, while
many of them are not yet implemented.
* Formats are not supported anyhow. (But I already have an idea how
this could be done: use a class perl.Format thatfore.) This also
includes the function 'write'.
* Separate prototype declarations are not supported, whereas
prototypes at function definitions are taken over (as comments).
* Prefix/postfix operator are not resolved.
* Esoteric execution orders are not supported (see
test/weird_contexts.pl).
* The Python libary included is not thread-save -- and probably will
never become thread-save.
* A lot more :-)
Availablity
-----------
'bridgekeeper' is available for download at
http://www.crazy-compilers.com/
IM (pronounced with a long I) is an Python module that makes
it easy to use Numeric and PIL together in programs. Typical
functions in IM are:
Open: Opens an image file using PIL and converts it
to Numeric, PIL, or OpenCV formats.
Save: Converts an array to PIL and saves it to a file.
Array_ToArrayCast: Converts images between formats and
between pixel types.
In addition to Numeric and PIL, IM works with the Intel OpenCV
computer vision system
(http://www.intel.com/research/mrl/research/opencv/). OpenCV is
available for Linux at the OpenCV Yahoo Group
(http://groups.yahoo.com/group/OpenCV/).
IM currently runs under Linux only. It should not be too difficult
to port the basic IM system to Windows or Mac. The OpenCV wrapper
is large and complex and uses SWIG. It will be harder to port.
The IM system appears to be pretty stable. On the other hand,
the OpenCV wrapper is probably very buggy.
To download the software go to
http://members.tripod.com/~edcjones/pycode.html and download
"PyCV.032502.tgz".
Edward C. Jones
edcjones(a)hotmail.com
I offer the following PEP for review by the community. If it receives
a favorable response, it will be implemented in Python 2.3.
A long discussion has already been held in python-dev about this PEP;
most things you could bring up have already been brought up there.
The head of the thread there is:
http://mail.python.org/pipermail/python-dev/2002-March/020750.html
I believe that the review questions listed near the beginning of the
PEP are the main unresolved issues from that discussion.
This PEP is also on the web, of course, at:
http://python.org/peps/pep-0285.html
If you prefer to look at code, here's a reasonably complete
implementation (in C; it may be slightly out of date relative to the
current CVS):
http://python.org/sf/528022
--Guido van Rossum (home page: http://www.python.org/~guido/)
PEP: 285
Title: Adding a bool type
Version: $Revision: 1.12 $
Last-Modified: $Date: 2002/03/30 05:37:02 $
Author: guido(a)python.org (Guido van Rossum)
Status: Draft
Type: Standards Track
Created: 8-Mar-2002
Python-Version: 2.3
Post-History: 8-Mar-2002, 30-Mar-2002
Abstract
This PEP proposes the introduction of a new built-in type, bool,
with two constants, False and True. The bool type would be a
straightforward subtype (in C) of the int type, and the values
False and True would behave like 0 and 1 in most respects (for
example, False==0 and True==1 would be true) except repr() and
str(). All built-in operations that conceptually return a Boolean
result will be changed to return False or True instead of 0 or 1;
for example, comparisons, the "not" operator, and predicates like
isinstance().
Review
Dear reviewers:
I'm particularly interested in hearing your opinion about the
following three issues:
1) Should this PEP be accepted at all.
2) Should str(True) return "True" or "1": "1" might reduce
backwards compatibility problems, but looks strange to me.
(repr(True) would always return "True".)
3) Should the constants be called 'True' and 'False'
(corresponding to None) or 'true' and 'false' (as in C++, Java
and C99).
Most other details of the proposal are pretty much forced by the
backwards compatibility requirement; e.g. True == 1 and
True+1 == 2 must hold, else reams of existing code would break.
Minor additional issues:
4) Should we strive to eliminate non-Boolean operations on bools
in the future, through suitable warnings, so that e.g. True+1
would eventually (e.g. in Python 3000 be illegal). Personally,
I think we shouldn't; 28+isleap(y) seems totally reasonable to
me.
5) Should operator.truth(x) return an int or a bool. Tim Peters
believes it should return an int because it's been documented
as such. I think it should return a bool; most other standard
predicates (e.g. issubtype()) have also been documented as
returning 0 or 1, and it's obvious that we want to change those
to return a bool.
Rationale
Most languages eventually grow a Boolean type; even C99 (the new
and improved C standard, not yet widely adopted) has one.
Many programmers apparently feel the need for a Boolean type; most
Python documentation contains a bit of an apology for the absence
of a Boolean type. I've seen lots of modules that defined
constants "False=0" and "True=1" (or similar) at the top and used
those. The problem with this is that everybody does it
differently. For example, should you use "FALSE", "false",
"False", "F" or even "f"? And should false be the value zero or
None, or perhaps a truth value of a different type that will print
as "true" or "false"? Adding a standard bool type to the language
resolves those issues.
Some external libraries (like databases and RPC packages) need to
be able to distinguish between Boolean and integral values, and
while it's usually possible to craft a solution, it would be
easier if the language offered a standard Boolean type.
The standard bool type can also serve as a way to force a value to
be interpreted as a Boolean, which can be used to normalize
Boolean values. Writing bool(x) is much clearer than "not not x"
and much more concise than
if x:
return 1
else:
return 0
Here are some arguments derived from teaching Python. When
showing people comparison operators etc. in the interactive shell,
I think this is a bit ugly:
>>> a = 13
>>> b = 12
>>> a > b
1
>>>
If this was:
>>> a > b
True
>>>
it would require one millisecond less thinking each time a 0 or 1
was printed.
There's also the issue (which I've seen puzzling even experienced
Pythonistas who had been away from the language for a while) that if
you see:
>>> cmp(a, b)
1
>>> cmp(a, a)
0
>>>
you might be tempted to believe that cmp() also returned a truth
value. If ints are not (normally) used for Booleans results, this
would stand out much more clearly as something completely
different.
Specification
The following Python code specifies most of the properties of the
new type:
class bool(int):
def __new__(cls, val=0):
# This constructor always returns an existing instance
if val:
return True
else:
return False
def __repr__(self):
if self:
return "True"
else:
return "False"
__str__ = __repr__
def __and__(self, other):
if isinstance(other, bool):
return bool(int(self) & int(other))
else:
return int.__and__(self, other)
__rand__ = __and__
def __or__(self, other):
if isinstance(other, bool):
return bool(int(self) | int(other))
else:
return int.__or__(self, other)
__ror__ = __or__
def __xor__(self, other):
if isinstance(other, bool):
return bool(int(self) ^ int(other))
else:
return int.__xor__(self, other)
__rxor__ = __xor__
# Bootstrap truth values through sheer willpower
False = int.__new__(bool, 0)
True = int.__new__(bool, 1)
The values False and True will be singletons, like None; the C
implementation will not allow other instances of bool to be
created. At the C level, the existing globals Py_False and
Py_True will be appropriated to refer to False and True.
All built-in operations that are defined to return a Boolean
result will be changed to return False or True instead of 0 or 1.
In particular, this affects comparisons (<, <=, ==, !=, >, >=, is,
is not, in, not in), the unary operator 'not', the built-in
functions callable(), hasattr(), isinstance() and issubclass(),
the dict method has_key(), the string and unicode methods
endswith(), isalnum(), isalpha(), isdigit(), islower(), isspace(),
istitle(), isupper(), and startswith(), the unicode methods
isdecimal() and isnumeric(), and the 'closed' attribute of file
objects.
Note that subclassing from int means that True+1 is valid and
equals 2, and so on. This is important for backwards
compatibility: because comparisons and so on currently return
integer values, there's no way of telling what uses existing
applications make of these values.
Compatibility
Because of backwards compatibility, the bool type lacks many
properties that some would like to see. For example, arithmetic
operations with one or two bool arguments is allowed, treating
False as 0 and True as 1. Also, a bool may be used as a sequence
index.
I don't see this as a problem, and I don't want evolve the
language in this direction either; I don't believe that a stricter
interpretation of "Booleanness" makes the language any clearer.
Another consequence of the compatibility requirement is that the
expression "True and 6" has the value 6, and similarly the
expression "False or None" has the value None. The "and" and "or"
operators are usefully defined to return the first argument that
determines the outcome, and this won't change; in particular, they
don't force the outcome to be a bool. Of course, if both
arguments are bools, the outcome is always a bool. It can also
easily be coerced into being a bool by writing for example
"bool(x and y)".
Issues
Because the repr() or str() of a bool value is different from an
int value, some code (for example doctest-based unit tests, and
possibly database code that relies on things like "%s" % truth)
may fail. How much of a backwards compatibility problem this will
be, I don't know. If we this turns out to be a real problem, we
could changes the rules so that str() of a bool returns "0" or
"1", while repr() of a bool still returns "False" or "True".
Other languages (C99, C++, Java) name the constants "false" and
"true", in all lowercase. In Python, I prefer to stick with the
example set by the existing built-in constants, which all use
CapitalizedWords: None, Ellipsis, NotImplemented (as well as all
built-in exceptions). Python's built-in module uses all lowercase
for functions and types only. But I'm willing to consider the
lowercase alternatives if enough people think it looks better.
It has been suggested that, in order to satisfy user expectations,
for every x that is considered true in a Boolean context, the
expression x == True should be true, and likewise if x is
considered false, x == False should be true. This is of course
impossible; it would mean that e.g. 6 == True and 7 == True, from
which one could infer 6 == 7. Similarly, [] == False == None
would be true, and one could infer [] == None, which is not the
case. I'm not sure where this suggestion came from; it was made
several times during the first review period. For truth testing
of a value, one should use "if", e.g. "if x: print 'Yes'", not
comparison to a truth value; "if x == True: print 'Yes'" is not
only wrong, it is also strangely redundant.
Implementation
An experimental, but fairly complete implementation in C has been
uploaded to the SourceForge patch manager:
http://python.org/sf/528022
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End:
I've just made an "emergency" release of the 2.2 documentation (based
on the release22-maint branch in CVS. If you use Opera 6.01, you can
now feel a little safer on python.org. ;-)
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation
The Scheme Workshop will be held this year in Pittsburgh, in October, the
day before ICFP. I am the workshop chairman. I am taking the liberty of
cross-posting the call for papers to some netnews programming-language
groups with related topics. We would certainly welcome papers on relevant
topics from your communities as well as attendance and participation in
the workshop's discussions.
-Olin Shivers
===============================================================================
ACM SIGPLAN
2002 Scheme Workshop
Thursday, October 3, 2002
Pittsburgh, Pennsylvania
The workshop forms part of PLI 2002, which consists
of the ICFP and PPDP conferences and other workshops.
Full details on the workshop can be found at URL
<http://scheme2002.ccs.neu.edu/>.
-------------------------------------------------------------------------------
* Scope
-------
The purpose of the workshop is to discuss experience with and future
developments of the Scheme programming language, as well as general aspects
of computer science loosely centered on the general theme of Scheme.
Papers are invited concerning all aspects of the design, semantics, theory,
application, implementation, and teaching of Scheme. Some example areas
include (but are not limited to):
- Theory
Formal semantics, correctness of analyses and transformations,
lambda calculus.
- Design critiques
Limitations of the language, future directions.
- Linguistic extensions
Scheme's simple syntactic framework and minimal static semantics
has historically made the language an attractive "lab bench" for the
development and experimentation of novel language features and mechanisms.
Topics in this area include modules systems, exceptions, control
mechanisms, distributed programming, concurrency and synchronisation, macro
systems, and objects.
- Type systems
Static analyses for dynamic type systems, type systems that bridge
the gap between static and dynamic types, static systems with
"type dynamic" extensions, weak typing.
- Implementation
Compilers, optimisation, virtual machines, resource management,
interpreters, foreign-function interfaces, partial evaluation, and generally
implementations with novel or noteworthy features.
- Program-development environments
The Lisp and Scheme family of programming languages have traditionally
been the source of innovative program-development environments. Authors
working on these issues are encouraged to submit papers describing
their technologies.
- Education
Scheme has achieved widespread use as a tool for teaching computer science.
Papers on the theory and practice of teaching with Scheme are invited.
- Applications and experience
Interesting applications which illuminate aspects of Scheme;
experience with Scheme in commercial or real-world contexts;
use of Scheme as an extension or scripting language.
- Scheme pearls
Elegant, instructive examples of functional programming.
A "Scheme pearl" submission is a special category, and should be a short
paper presenting an algorithm, idea or programming device using Scheme in a
way that is particularly elegant.
-------------------------------------------------------------------------------
* Proceedings
-------------
The proceedings of the conference will be published as a Georgia Tech College
of Computing technical report. A special issue of the journal Higher-Order and
Symbolic Computation about Scheme <http://www.kluweronline.com/issn/1388-3690>
is planned afterwards.
-------------------------------------------------------------------------------
* Important dates
-----------------
- Submission deadline: 2200 UTC, 17 May, 2002
- Notification of acceptance or rejection: 28 June, 2002
- Final paper due: 31 August, 2002
- Workshop: 3 October, 2002
-------------------------------------------------------------------------------
* Submission guidelines
-----------------------
Authors should submit a 100-200 word abstract and a full paper by 22:00
Universal Coordinated Time on Thursday, 17 May, 2002. (Note that 2200
UTC is 1800 EDT, or 1500 PDT.)
Submissions will be carried out electronically via the Web, at
<http://scheme2002.ccs.neu.edu/>. Papers must be submitted in PDF format, or as
PostScript documents that are interpretable by Ghostscript, and they must be
printable on US Letter sized paper. Individuals for which this requirement is
a hardship should contact the program chair at least one week before the
deadline.
There are two classes of submissions: reular papers and short papers:
- Regular papers
Submissions should be no more than 12 pages (including bibliography and
appendices) in standard ACM conference format: two columns, nine point font
on ten point baseline, page 20pc (3.33in) wide and 54pc (9in) tall with a
column gutter of 2pc (0.33in). Authors wishing to supply additional material
to the reviewers beyond the 12 page limit can do so in clearly marked
appendices, on the understanding that reviewers are not required to read the
appendices. Submissions that do not meet these guidelines will not be
considered. Suitable style files for Latex, Word, and Word Perfect can be
found on the submission Web site.
Submitted papers must have content that has not previously been published in
other conferences or refereed venues, and simultaneous submission to other
conferences or refereed venues is unacceptable. Each paper should explain its
contributions in both general and technical terms, clearly identifying what
has been accomplished, saying why it is significant, and comparing it with
previous work. Authors should strive to make the technical content of their
papers understandable to a broad audience.
- Short papers
Short papers need not present novel research; it is sufficient that they
present material of interest or utility to the Scheme or
functional-programming community. "Scheme pearls" submissions should
be presented as short papers.
Short papers should be formatted with the same guidelines as regular
papers, but are expected to be typically around six pages in length.
-------------------------------------------------------------------------------
* Organisers
------------
Workshop chair:
Olin Shivers
College of Computing
Georgia Tech
Atlanta, Ga 30332-0280
+1 404 385.00.91
shivers(a)cc.gatech.edu
Steering committee:
William D Clinger (Northeastern University)
Marc Feeley (University of Montreal)
Matthias Felleisen (Northeastern University)
Matthew Flatt (University of Utah)
Dan Friedman (Indiana University)
Christian Queinnec (University Paris 6)
Manuel Serrano (INRIA)
Mitchell Wand (Northeastern University)
Program committee:
Alan Bawden (Brandeis)
Olivier Danvy (University of Aarhus)
Richard Kelsey (Ember, Corp.)
Brad Lucier (Purdue University)
Paul Steckler (Northeastern University)
Andrew Wright (Aleris)
Publicity:
Shriram Krishnamurthi (Brown University)
The second public version of Fetchem has been released. It can be
found at "http://sourceforge.net/projects/fetchem/". Fetchem is a
download/filter/decode program for image newsgroups written
entirely in Python. It uses a variety of algorithms to filter spam
out of image newsgroups.
There is currently so much spam in these newsgroups that it has
become very difficult to read them. Good spam filters require a
fully powered programming language. Some of the spam can be
removed by regex searches of the news article headers. But
removing other spam, including the notorious high volume
P`H.E'R-OM,O^NE spam, requires the power of a complete programming
language. Since Python (http://www.python.org) is easy to read and
write, it is used in Fetchem.
FEATURES
Some of the features of Fetchem (terrible name) are:
1. Powerful filtering capabilities that the user can reprogram.
(See match.py.)
2. Prepares HTML output for your browser. "html.py" contains (yet
another) HTML writing program. It uses Python's keyword
arguments to pass in the attributes for each tag. For example,
FONT('stuff', Color='ff0000') returns the string
<FONT COLOR="ff0000">stuff</A>
3. The header data is kept in a robust MySQL database.
4. Downloads images you choose. Uses uudeview to decode the
images.
REQUIREMENTS
A Linux system with:
Python 2.2 + http://www.python.org/
MySQL I used version 3.23.
http://www.mysql.com
MySQLdb I used MySQL-python-0.3.5.
http://sourceforge.net/projects/mysql-python
uudeview 0.5.17 or later.
http://www.fpx.de/fp/Software/UUDeview/
getdatemodule getdatemodule-19990617-1127-jam.tgz
ftp://starship.python.net/pub/crew/jam/
Version 0.06 of SCons has been released and is available for download
from the SCons web site:
http://www.scons.org/
Or through the download link at the SCons project page at SourceForge:
http://sourceforge.net/projects/scons/
RPM and Debian packages and a Win32 installer are all available, in
addition to the traditional .tar.gz and .zip files.
WHAT'S NEW IN THIS RELEASE?
IMPORTANT: This release contains the following interface changes:
- FunctionAction arguments are now Nodes, not strings.
This release adds the following features:
- New RANLIB and RANLIBFLAGS construction variables.
- A new configurable CFILESUFFIX for the Builder of .l and .y
files into C files.
- A CXXFile Builder that turns .ll and .yy files into .cc files
(configurable via a CXXFILESUFFIX construction variable).
- A new --profile=FILE option to make profiling SCons easier.
- Support for Aliases (phony targets).
- A new WhereIs() method for searching for path names to executables.
- New PDF and PostScript document builders.
- New support for compiling Fortran programs from a variety of
suffixes (a la GNU Make): .f, .F, .for, .FOR, .fpp and .FPP
- A new CPPFLAGS variable on all default commands that use the
C preprocessor.
- CPPPATH, LIBPATH and LIBS can now be specified as white-space
separated lists of directories/libraries.
- A new -U option.
- Support env['VAR'] syntax to fetch construction variable values.
The following fixes have been added:
- Command generators now expand construction variables.
- Use the POSIX-standard lex -t flag, not the GNU-specific -o flag.
(Bug reported by Russell Christensen.)
- Fixed an exception when CPPPATH or LIBPATH is a null string.
(Bug reported by Richard Kiss.)
- Construction variables with values of 0 were incorrectly
interpolated as ''.
Performance has been improved as follows:
- A dictionary, not a list, is now used to track a Node's parents.
The following changes have been made to the SCons packaging:
- Both installation and source packages are now available as .zip
files, in addition to .tar.gz files.
The documentation has been improved:
- The LIBS and ARGUMENTS construction variables have been documented.
- The Precious() method has been documented.
WHAT IS SCONS?
SCons is a software construction tool (build tool, or make tool) written
in Python. Its design is based on the design which won the Software
Carpentry build tool competition in August 2000 (in turn derived from
the Perl-based Cons build tool).
Distinctive features of SCons include:
- configuration files are Python scripts, allowing the full use of a
real scripting language to solve build problems
- a modular architecture allows the SCons Build Engine to be
embedded in other Python software
- a global view of all dependencies; no multiple passes to get
everything built
- the ability to scan files for implicit dependencies (#include files);
- improved parallel build (-j) support
- use of MD5 signatures to decide if a file has changed
- easily extensible through user-defined Builder and Scanner objects
- build actions can be Python code, as well as external commands
An scons-users mailing list has been created for those interested in
getting started using SCons. You can subscribe at:
http://lists.sourceforge.net/lists/listinfo/scons-users
Alternatively, we invite you to subscribe to the low-volume
scons-announce mailing list to receive notification when new versions of
SCons become available:
http://lists.sourceforge.net/lists/listinfo/scons-announce
ACKNOWLEDGEMENTS
Special thanks to Charles Crain, Stephen Kennedy, Steve Leblanc, and
Anthony Roach for their contributions to this release.
On behalf of the SCons team,
--SK
I've made a tidied-up release of the Medusa toolkit for writing
asynchronous servers. This release adds no new features to the last
release, but discards some old unmaintained code, fixes some bugs, and
moves everything into the 'medusa' package.
The code is available from http://www.amk.ca/python/code/medusa.html .
A full list of my changes is:
Version 0.5.1:
* Apply cleanup patch from Donovan Baarda
* Fix bug reported by Van Gale: counter.py and auth_handler.py did
long(...)[:-1] to chop off a trailing L generated in earlier
versions of Python.
* Fix bug in ftp_server.py that I introduced in 0.5
* Remove some duplicated producer classes
* Removed work_in_progress/ directory and the 'continuation' module
* Remove MIME type table code and use the stdlib's mimelib module instead
Version 0.5:
* Added a setup.py installation script, which will install all the code
under the package name 'medusa'.
* Added README.txt and CHANGES.txt.
* Fixed NameError in util/convert_mime_type_table.py
* Fixed TypeError in test/test_medusa.py
* Fixed several problems detected by PyChecker
* Changed demos to use 'from medusa import ...'
* Rearranged files to reduce the number of subdirectories.
* Removed or updated uses of the obsolete regsub module
* Removed asyncore.py and asynchat.py; these modules were added to Python's
standard library with version 1.5.2, and Medusa now assumes that they're
present.
* Removed many obsolete files:
poll/pollmodule.c, as Python's select module now supports poll()
patches/posixmodule.c, as that patch was incorporated in Python
old/*, script_handler_demo/*, sendfile/*
The old ANNOUNCE files
* Reindented all files to use four-space indents
--
A.M. Kuchling http://www.amk.ca
Only the phoenix arises and does not descend. And everything changes.
And nothing is truly lost.
-- The true end of the series, in SANDMAN #74, "The Exile"
A package database is a necessary prequisite for managing the Python
packages installed on a system. PEP 262 lists the requirements for
such a database and specifies a storage format for it.
I'd like to get this into Python 2.3, hopefully with a
still-to-be-specified package management tool. Assuming no one points
out some requirement or use case missing from this draft of the PEP,
my next step will be to write a proposed interface, post that draft,
and then implement the PEP and integrate it with the Distutils.
Comments can be posted to comp.lang.python or to the Distutils SIG.
--
A.M. Kuchling http://www.amk.ca
Thank you for letting me borrow your objects.
-- Ute Lemper in concert, March 13, 1997
PEP: 262
Title: A Database of Installed Python Packages
Version: $Revision: 1.5 $
Author: A.M. Kuchling <akuchlin(a)mems-exchange.org>
Type: Standards Track
Created: 08-Jul-2001
Status: Draft
Post-History: 27-Mar-2002
Introduction
This PEP describes a format for a database of Python packages
installed on a system.
Requirements
We need a way to figure out what packages, and what versions of
those packages, are installed on a system. We want to provide
features similar to CPAN, APT, or RPM. Required use cases that
should be supported are:
* Is package X on a system?
* What version of package X is installed?
* Where can the new version of package X be found? (This can
be defined as either "a home page where the user can go and
find a download link", or "a place where a program can find
the newest version?" Both should probably be supported.)
* What files did package X put on my system?
* What package did the file x/y/z.py come from?
* Has anyone modified x/y/z.py locally?
Database Location
The database lives in a bunch of files under
<prefix>/lib/python<version>/install/. This location will be
called INSTALLDB through the remainder of this PEP.
The structure of the database is deliberately kept simple; each
file in this directory or its subdirectories (if any) describes a
single package.
The rationale for scanning subdirectories is that we can move to a
directory-based indexing scheme if the package directory contains
too many entries. For example, this would let us transparently
switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
similar hashing scheme.
Database Contents
Each file in INSTALLDB or its subdirectories describes a single
package, and has the following contents:
An initial line listing the sections in this file, separated
by whitespace. Currently this will always be 'PKG-INFO
FILES'. This is for future-proofing; if we add a new section,
for example to list documentation files, then we'd add a DOCS
section and list it in the contents. Sections are always
separated by blank lines.
PKG-INFO section
An initial set of RFC-822 headers containing the package
information for a file, as described in PEP 241, "Metadata for
Python Software Packages".
A blank line indicating the end of the PKG-INFO section.
FILES section
An entry for each file installed by the package. Generated files
such as .pyc and .pyo files are on this list as well as the original
.py files installed by a package; their checksums won't be stored or
checked, though.
Each file's entry is a single tab-delimited line that contains
the following fields:
* The file's full path, as installed on the system.
* The file's size
* The file's permissions. On Windows, this field will always be
'unknown'
* The owner and group of the file, separated by a tab.
On Windows, these fields will both be 'unknown'.
* An MD5 digest of the file, encoded in hex.
A package that uses the Distutils for installation should
automatically update the database. Packages that roll their own
installation will have to use the database's API to to manually
add or update their own entry. System package managers such as
RPM or pkgadd can just create the new 'package name' file in the
INSTALLDB directory.
Deliverables
A description of the database API, to be added to this PEP.
Patches to the Distutils that 1) implement an InstallationDatabase
class, 2) Update the database when a new package is installed. 3)
a simple package management tool, features to be added to this
PEP. (Or a separate PEP?)
Rejected Suggestions
Instead of using one text file per package, one large text file or
an anydbm file could be used. This has been rejected for a few
reasons. First, performance is probably not an extremely pressing
concern as the package database is only used when installing or
removing packages, a relatively infrequent task. Scalability also
likely isn't a problem, as people may have hundreds of Python
packages installed, but thousands seems unlikely. Finally,
individual text files are compatible with installers such as RPM
or DPKG because a package can just drop the new database file into
the database directory. If one large text file or a binary file
were used, the Python database would then have to be updated by
running a postinstall script.
On Windows, the permissions and owner/group of a file aren't
stored. Windows does in fact support ownership and access
permissions, but reading and setting them requires the win32all
extensions, and they aren't present in the basic Python installer
for Windows.
References
[1] Michael Muller's patch (posted to the Distutils-SIG around 28
Dec 1999) generates a list of installed files.
Acknowledgements
Ideas for this PEP originally came from postings by Greg Ward,
Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others.
Many changes and rewrites to this document were suggested by the
readers of the Distutils SIG.
Copyright
This document has been placed in the public domain.