Electric Duncan: python

Showing posts with label python. Show all posts

Wednesday, August 09, 2017

NASA/EOSDIS Earthdata

Update

It's been a few years since I posted on this blog -- most of the technical content I've been contributing to in the past couple years has been in the following:

The LFE Blog

The Clojang Blog

But since the publication of the Mastering matplotlib book, I've gotten more and more into satellite data. The book, it goes without saying, focused on Python for the analysis and interpretation of satellite data (in one of the many topics covered). After that I spent some time working with satellite and GIS data in general using Erlang and LFE. Ultimately though, I found that more and more projects were using the JVM for this sort of work, and in particular, I noted that Clojure had begun to show up in a surprising number of Github projects.

EOSDIS

Enter NASA's Earth Observing System Data and Information System (see also earthdata.nasa.gov and EOSDIS on Wikipedia), a key part of the agency's Earth Science Data Systems Program. It's essentially a concerted effort to bring together the mind-blowing amounts of earth-related data being collected throughout, around, and above the world so that scientists may easily access and correlate earth science data for their research.

Related NASA projects include the following:

The acronym menagerie can be bewildering, but digging into the various NASA projects is ultimately quite rewarding (greater insights, previously unknown resources, amazing research, etc.).

Clojure

Back to the Clojure reference I made above: I've been contributing to the nasa/Common-Metadata-Repository open source project (hosted on Github) for a few months now, and it's been amazing to see how all this data from so many different sources gets added, indexed, updated, and generally made so much more available to any who want to work with it. The private sector always seems to be so far ahead of large projects in terms of tech and continuously improving updates to existing software, so its been pretty cool to see a large open source project in the NASA Github org make so many changes that find ways to keep helping their users do better research. More so that users are regularly delivered new features in a large, complex collection of libraries and services thanks in part to the benefits that come from using a functional programming language.

It may seem like nothing to you, but the fact that there are now directory pages for various data providers (e.g., GES_DISC, i.e., Goddard Earth Sciences Data and Information Services Center) makes a big difference for users of this data. The data provider pages now also offer easy access to collection links such as UARS Solar Ultraviolet Spectral Irradiance Monitor. Admittedly, the directory pages still take a while to load, but there are improvements on the way for page load times and other related tasks. If you're reading this a month after this post was written, there's a good chance it's already been fixed by now.

Summary

In summary, it's been a fun personal journey from looking at Landsat data for writing a book to working with open source projects that really help scientists to do their jobs better :-) And while I have enjoyed using the other programming languages to explore this problem space, Clojure in particular has been a delightfully powerful tool for delivering new features to the science community.

Friday, July 10, 2015

Mastering matplotlib: Acknowledgments

The Book

Well, after nine months of hard work, the book is finally out! It's available both on Packt's site and Amazon.com. Getting up early every morning to write takes a lot of discipline, it takes even more to say "no" to enticing rabbit holes or herds of Yak with luxurious coats ripe for shaving ... (truth be told, I still did a bit of that).

The team I worked with at Packt was just amazing. Highly professional and deeply supportive, they were a complete pleasure with which to collaborate. It was the best experience I could have hoped for. Thanks, guys!

The technical reviewers for the book were just fantastic. I've stated elsewhere that my one regret was that the process with the reviewers did not have a tighter feedback loop. I would have really enjoyed collaborating with them from the beginning so that some of their really good ideas could have been integrated into the book. Regardless, their feedback as I got it later in the process helped make this book more approachable by readers, more consistent, and more accurate. The reviewers have bios at the beginning of the book -- read them, and look them up! These folks are all amazing!

The one thing that slipped in the final crunch was the acknowledgements, and I hope to make up for that here, as well as through various emails to everyone who provided their support, either directly or indirectly.

Acknowledgments

The first two folks I reached out to when starting the book were both physics professors who had published very nice matplotlib problems -- one set for undergraduate students and another from work at the National Radio Astronomy Observatory. I asked for their permission to adapt these problems to the API chapter, and they graciously granted it. What followed were some very nice conversations about matplotlib, programming, physics, education, and publishing. Thanks to Professor Alan DeWeerd, University of Redlands and Professor Jonathan W. Keohane, Hampden Sydney College. Note that Dr. Keohane has a book coming out in the fall from Yale University Press entitled Classical Electrodynamics -- it will contain examples in matplotlib.

Other examples adapted for use in the API chapter included one by Professor David Bailey, University of Toronto. Though his example didn't make it into the book, it gets full coverage in the Chapter 3 IPython notebook.

For one of the EM examples I needed to derive a particular equation for an electromagnetic field in two wires traveling in opposite directions. It's been nearly 20 years since my post-Army college physics, so I was very grateful for the existence and excellence of SymPy which enabled me to check my work with its symbolic computations. A special thanks to the SymPy creators and maintainers.

Please note that if there are errors in the equations, they are my fault! Not that of the esteemed professors or of SymPy :-)

Many of the examples throughout the book were derived from work done by the matplotlib and Seaborn contributors. The work they have done on the documentation in the past 10 years has been amazing -- the community is truly lucky to have such resources at their fingertips.

In particular, Benjamin Root is an astounding community supporter on the matplotlib mail list, helping users of every level with all of their needs. Benjamin and I had several very nice email exchanges during the writing of this book, and he provided some excellent pointers, as he was finishing his own title for Packt: Interactive Applications Using Matplotlib. It was geophysicist and matplotlib savant Joe Kington who originally put us in touch, and I'd like to thank Joe -- on everyone's behalf -- for his amazing answers to matplotlib and related questions on StackOverflow. Joe inspired many changes and adjustments in the sample code for this book. In fact, I had originally intended to feature his work in the chapter on advanced customization (but ran out of space), since Joe has one of the best examples out there for matplotlib transforms. If you don't believe me, check out his work on stereonets. There are many of us who hope that Joe will be authoring his own matplotlib book in the future ...

Olga Botvinnik, a contributor to Seaborn and PhD candidate at UC San Diego (and BioEng/Math double major at MIT), provided fantastic support for my Seaborn questions. Her knowledge, skills, and spirit of open source will help build the community around Seaborn in the years to come. Thanks, Olga!

While on the topic of matplotlib contributors, I'd like to give a special thanks to John Hunter for his inspiration, hard work, and passionate contributions which made matplotlib a reality. My deepest condolences to his family and friends for their tremendous loss.

Quite possibly the tool that had the single-greatest impact on the authoring of this book was IPython and its notebook feature. This brought back all the best memories from using Mathematica in school. Combined with the Python programming language, I can't imagine a better platform for collaborating on math-related problems or producing teaching materials for the same. These compliments are not limited to the user experience, either: the new architecture using ZeroMQ is a work of art. Nicely done, IPython community! The IPython notebook index for the book is available in the book's Github org here.

In Chapters 7 and 8 I encountered a bit of a crisis when trying to work with Python 3 in cloud environments. What was almost a disaster ended up being rescued by the work that Barry Warsaw and the rest of the Ubuntu team did in Ubuntu 15.04, getting Python 3.4.2 into the release and available on Amazon EC2. You guys saved my bacon!

Chapter 7's fictional case study examining the Landsat 8 data for part of Greenland was based on one of Milos Miljkovic's tutorials from PyData 2014, "Analyzing Satellite Images With Python Scientific Stack". I hope readers have just as much fun working with satellite data as I did. Huge thanks to NASA, USGS, the Landsat 8 teams, and the EROS facility in Sioux Falls, SD.

My favourite section in Chapter 8 was the one on HDF5. This was greatly inspired by Yves Hilpisch's presentation "Out-of-Memory Data Analytics with Python". Many thanks to Yves for putting that together and sharing with the world. We should all be doing more with HDF5.

Finally, and this almost goes without saying, the work that the Python community has done to create Python 3 has been just phenomenal. Guido's vision for the evolution of the language, combined with the efforts of the community, have made something great. I had more fun working on Python 3 than I have had in many years.

Thursday, January 01, 2015

Scientific Computing and the Joy of Language Interop

The scientific computing platform for Erlang/LFE has just been announced on the LFE blog. Though written in the Erlang Lisp syntax of LFE, it's fully usable from pure Erlang. It wraps the new py library for Erlang/LFE, as well as the ErlPort project. More importantly, though, it wraps Python 3 libs (e.g., math, cmath, statistics, and more to come) and the ever-eminent NumPy and SciPy projects (those are in-progress, with matplotlib and others to follow).

(That LFE blog post is actually a tutorial on how to use lsci for performing polynomial curve-fitting and linear regression, adapted from the previous post on Hy doing the same.)

With the release of lsci, one can now start to easily and efficiently perform computationally intensive calculations in Erlang/LFE (and any other Erlang Core-compatible language, e.g., Elixir, Joxa, etc.) That's super-cool, but it's not quite the point ...

While working on lsci, I found myself experiencing a great deal of joy. It wasn't just the fact that supervision trees in a programming language are insanely great. Nor just the fact that scientific computing in Python is one of the best in any language. It wasn't only being able to use two syntaxes that I love (LFE and Python) cohesively, in the same project. And it wasn't the sum of these either -- you probably see where I'm going with this ;-) The joy of these and many other fantastic aspects of inter-operation between multiple powerful computing systems is truly greater than the sum of its parts.

I've done a bunch of Julia lately and am a huge fan of this language as well. One of the things that Julia provides is explicit interop with Python. Julia is targeted at the world of scientific computing, aiming to be a compelling alternative to Fortran (hurray!), so their recognition of the enormous contribution the Python scientific computing community has made to the industry is quite wonderful to see.

A year or so ago I did some work with Clojure and LFE using Erlang's JInterface. Around the same time I was using LFE on top of Erjang, calling directly into Java without JInterface. This is the same sort of Joy that users of Jython have, and there are many more examples of languages and tools working to take advantage of the massive resources available in the computing community.

Obviously, language inter-op is not new. Various FFIs have existed for quite some time (I'm a big fan of the Common Lisp CFFI), but what is new (relatively, that is ... as I age, anything in the past 10 years is new) is that we are seeing this not just for programs reaching down into C/C++, but reaching across, to other higher-level languages, taking advantage of their great achievements -- without having to reinvent so many wheels.

When this level of cooperation, credit, etc., is done in the spirit of openness, peer-review, code-reuse, and standing on the shoulders of giants (or enough people to make giants!), we get joy. Beautiful, wonderful coding joy.

And it's so much greater than the sum of the parts :-)

Saturday, December 27, 2014

Improved Python Support in Erlang/LFE

The previous post on Python support in Erlang/LFE made Hacker News this week, climbing in fits and starts to #19 on the front page. That resulted in the biggest spike this blog has seen in several months.

It's a shame, in a way, since it came a few days too early: there's a new library out for the Erlang VM (written in LFE) which makes it much easier to use Python from Erlang (the language from Sweden that's famous for impressing both your mum and your cats).

The library is simply called py. It's a wrapper for ErlPort, providing improved usability for Python-specific code as well as an Erlang process supervision tree for the ErlPort Python server. It has an extensive README that not only does the usual examples with LFE, but gives a full accounting of usage in the more common Prolog-inspired syntax Erlang. The LFE Blog has a new post with code examples as well as a demonstration of the py supervision tree (e.g., killing Python server processes and having them restart automatically) which hasn't actually made it into the README yet -- so get it while it's hot!

The most exciting bits are yet to come: there are open tickets for:

work on multiple Python server processes
scheduling code execution to these, and
full Python distribution infrastructure with parallel execution.

This could drastically change the picture for compute-intensive tasks in Erlang, Elixir, LFE, and Joxa. The Erlang VM was never intended to excel at the sort of problems that Python has traditionally focused on... yet it provides the sort of infrastructure that the Python community has been agonizing over for more than a decade. For Pythonistas, this may not be a very big deal ... but for the Erlang and functional programming communities, the LFE py project could be a life-saver for any number of projects which need easy-access to the strengths of Python.

Friday, November 28, 2014

Scientific Computing with Hy and IPython

This blog post is a bit different than other technical posts I've done in the past in that the majority of the content is not on the blog in or gists; instead, it is in an IPython notebook. Having adored Mathematica back in the 90s, you can imagine how much I love the IPython Notebook app. I'll have more to say on that at a future date.

I've been doing a great deal of NumPy and matplotlib again lately, every day for hours a day. In conjunction with the new features in Python 3, this has been quite a lot of fun -- the most fun I've had with Python in years (thanks Guido, et al!). As you might have guessed, I'm also using it with Erlang (specifically, LFE), but that too is for a post yet to come.

With all this matplotlib and numpy work in standard Python, I've been going through Lisp withdrawals and needed to work with it from a fresh perspective. Needless to say, I had an enormous amount of fun doing this. Naturally, I decided to share with folks how one can do the latest and greatest with the tools of Python scientific computing, but in the syntax of the Python community's best kept secret: Clojure-Flavoured Python (Github, Twitter, Wikipedia).

Spoiler: observed data and
polynomial curve fitting

Looking about for ideas, I decided to see what Clojure's Incanter project had for tutorials, and immediately found what I was looking for: Linear regression with higher-order terms, a 2009 post by David Edgar Liebke.

Nearly every cell in the tutorial notebook is in Hy, and for that we owe a huge thanks to yardsale8 for his Hy IPython magics code. For those that love Python and Lisp equally, who are familiar with the ecosystems' tools, Hy offers a wonderful option for being highly productive with a language supporting Lisp- and Clojure-style macros. You can get your work done, have a great time doing it, and let that inner code artist out!

(In fact, I've started writing a macro for one of the examples in the tutorial, offering a more Lisp-like syntax for creating class methods. We'll see what Paul Tagliamonte has to say about it when it's done ... !)

If you want to check out the notebook code and run it locally, just do the following:

This will do the following:

Create a virtualenv using Python 3
Download all the dependencies, and then
Start up the notebook using a local IPython HTTP server

If you just want to read along, you're more than welcome to do that as well, thanks to the IPython NBViewer service. Here's the link: Scientific Computing with Hy: Linear Regressions.

One thing I couldn't get working was the community-provided code for generating tables of contents in IPython notebooks. If you have any expertise in this area, I'd love to get your feedback to see how I need to configure the custom ihy IPython profile for this tutorial.

Without that, I've opted for the manual approach and have provided a table of contents here:

Introduction

Preparation

If all goes well, you will enjoy that as much as I did :-)

More soon ...

Friday, November 21, 2014

ErlPort: Using Python from Erlang/LFE

Update 1: This post has a sequel here.

Update 2: There is a new LFE library that provides more idiomatic access to Python from LFE/Erlang by wrapping ErlPort and creating convenience functions. Lisp macros were, of course, involved in its making.

This is a short little blog post I've been wanting to get out there ever since I ran across the erlport project a few years ago. Erlang was built for fault-tolerance. It had a goal of unprecedented uptimes, and these have been achieved. It powers 40% of our world's telecommunications traffic. It's capable of supporting amazing levels of concurrency (remember the 2007 announcement about the performance of YAWS vs. Apache?).

With this knowledge in mind, a common mistake by folks new to Erlang is to think these performance characteristics will be applicable to their own particular domain. This has often resulted in failure, disappointment, and the unjust blaming of Erlang. If you want to process huge files, do lots of string manipulation, or crunch tons of numbers, Erlang's not your bag, baby. Try Python or Julia.

But then, you may be thinking: I like supervision trees. I have long-running processes that I want to be managed per the rules I establish. I want to run lots of jobs in parallel on my 64-core box. I want to run jobs in parallel over the network on 64 of my 64-core boxes. Python's the right tool for the jobs, but I wish I could manage them with Erlang.

(There are sooo many other options for the use cases above, many of them really excellent. But this post is about Erlang/LFE :-)).

Traditionally, if you want to run other languages with Erlang in a reliable way that doesn't bring your Erlang nodes down with badly behaved code, you use Ports. (more info is available in the Interoperability Guide). This is what JInterface builds upon (and, incidentally, allows for some pretty cool integration with Clojure). However, this still leaves a pretty significant burden for the Python or Ruby developer for any serious application needs (quick one-offs that only use one or two data types are not that big a deal).

erlport was created by Dmitry Vasiliev in 2009 in an effort to solve just this problem, making it easier to use of and integrate between Erlang and more common languages like Python and Ruby. The project is maintained, and in fact has just received a few updates. Below, we'll demonstrate some usage in LFE with Python 3.

If you want to follow along, there's a demo repo you can check out:
Change into the repo directory and set up your Python environment:
Next, switch over to the LFE directory, and fire up a REPL:
Note that this will first download the necessary dependencies and compile them (that's what the [snip] is eliding).

Now we're ready to take erlport for a quick trip down to the local:
And that's all there is to it :-)

Perhaps in a future post we can dive into the internals, showing you more of the glory that is erlport. Even better, we could look at more compelling example usage, approaching some of the functionality offered by such projects as Disco or Anaconda.

Tuesday, March 12, 2013

Lisp Flavored Erlang

I've flirted with Lisp since the 90s, really started getting into it around 2008 when I started playing with genetic programming, and more recently investigated Common Lisp, Scheme (Gambit and Chicken), and Clojure with the intent of discovering the best way to write distributed programs in Lisp. (I even seriously explored implementing chunks of Twisted in Common Lisp. If only I had more time...)

Needless to say, I kept coming back to Erlang as it is a natural choice for the sort of concurrent programming I was interested in doing. On several occasions, I'd run across Robert Virding's Lisp-2 that he had written on top of the Erlang VM. At first blush, this appeared quite promising. Yet faced with the perceived burden of learning Erlang, I consistently set it aside for a future date.

"Excuse me, could I have some Erlang, please? Yes, just a cup. Oh, and can I get that Lisp-flavored? Thanks so much."

After bouncing between Clojure and CL, after running into difficulties with Termite on Chicken Scheme, and finally, after being quite impressed with the efforts made by the Elixir folks (who I believe took inspiration from LFE!), I decided to give LFE a chance. Within minutes of that decision, I came to two conclusions:

LFE is brilliant.
LFE needs docs like Elixir... and tutorials... and exposure! Why haven't I been using LFE all along?!

At which point, I started hacking on some docs to see if I could stick with it. When, after a few days, I proved to myself that I could, I contacted Robert and let him know not only how much I adored his masterpiece, but that I really wanted to write tons and tons of docs for it so that anyone could pick it up and start using it right away. I wanted to write docs for an audience like me, that didn't know Erlang, who weren't Lisp gurus.

This seemed like a doable goal, since I had about 5 minutes' worth of Erlang experience at the time I was having these conversations with Robert. I was learning Erlang at a rapid pace simply by virtue of the Lisp hook that LFE provided.

Our interactions led to the publicizing of the new docs site for Lisp Flavored Erlang on the LFE google groups list. We also created a Twitter account (we both have full access to it, but I tend to maintain it) whose sole purpose is to bring LFE to more people, keep the news around LFE fresh, etc.

"I could have sworn you just said 'Lisp'..."

A side note about Lisp: S-expressions are concise and elegant. Code and data using the same form is quite powerful. I do believe that the technology industry has the capacity to realize that old biases against Lisp are just that: old and outdated. The many dialects of Lisp are anything but. Clojure and (I believe) LFE are perfect examples of this. Whole new generations of programmers are delighting in the expressive power of a language whose roots can be traced back to actual manipulations of memory registers.

To resume the narrative: in the course of various efforts focused on documenting LFE, asking questions on the mail list, and having various other discussions, Robert pointed out that some of my coworkers at Rackspace had been working on Erlang projects. I subsequently reached out to Phil Toland. Then, within minutes of this (and entirely coincidentally), Kai Janson emailed a group of us about Erlang Factory SF and his desire to provide Erlang workshops for engineers at Rackspace.

This led to further conversations with Robert, then with Francesco, with several Rackers signing up for Erlang Factory this year, and finally, with me volunteering to put a Meetup together afterwards, hosted at Rackspace's SF office (more on that in a few hours).

For the curious, I do continue to work in Python and Twisted; I am excited about the new async support that Guido is spearheading for Python 3 and which has electrified so many hard-core Python hackers. Similarly, I continue to hack on Common Lisp projects. However, I am quite delighted that I have found a way to interface with Erlang which matches how I think, matches my aesthics. And finally, I look forward to many fruitful years of LFE in my life :-)

Thanks Joe! Thanks Mike! Thanks Robert!

Wednesday, December 12, 2012

Async in Python 3

Update: Guido has been working on PEP 3156; check on it regularly for the latest! (In the last two hours I've seen it updated with three big content changes.)

The buzz has died down a bit now, but the mellowing of the roaring flames has resulted in some nice embers in which an async for Python 3 is being forged. This is an exciting time for those of us who 1) love Python and 2) can't get us enough async.

I wanted to take the time to record some of the goodness here before I forgot or got too busy working on something else. So here goes:

The latest bout of Python async fever started in September of 2012 in this message when Christian M. Amsüss emailed the Python-ideas mail list about the state of async in Python and the hopes that a roadmap could be decided upon for Python 3. Note that this is the latest (re)incarnation of conversations that have been going on for some time and for which there is even a PEP (with related work on github).

After a few tens of messages were exchanged, Guido shared his thoughts, starting with:

This is an incredibly important discussion.

This seemed to really heat things up, eventually with core Twisted and Tornado folks chiming in. I learned a tremendous amount from the discussions that took place. There's probably a book deal in all that for a motivated archivist/interviewer...

After this went on for chunks of September and October, Guido stated that he'd like to break the discussion up into various sub-topics:

reactors
protocol implementations
Twisted (esp. Deferred)
Tornado
yield from vs. Futures

This was done in order to prevent the original thread from going over 100 messages and to better organize the discussion... but wow, things completely exploded after that (in good ways. mostly). It was async open season, and the ringing of shots in the air seemed continuous. If you scroll to about the half-way point of the October archive page, you will see the first of these new threads ([Python-ideas] The async API of the future: Reactors). These messages essentially dominate the rest of the October archives. It's probably not unexpected that this continued into November. A related thread was started on Python-dev and it seemed to revive an old thread this month (on the same list).

All of this got mentioned on Reddit, too. It inspired at least two blog posts of which I am aware: one post by Steve Dower, and another by Allen Short. Even better, though, Guido started an exploratory project called Tulip to test out some of these ideas in actual running code. As he mentions in the README, a tutorial by Greg Ewing was influential in the initial implementation of Tulip and initial design notes were made in the message [Python-ideas] Async API: some code to review.

Shortly after that, some of the Twisted devs local to Guido met with him at his former office in San Francisco. This went amazingly well and revolved mostly around the pros and cons of separating the protocol and transport functionality. Guido started experimenting with that in Tulip on December 6th. Yesterday, a followup meeting took place at the Rackspace office, this time with notes.

There's a long way to go still, but I find myself compulsively checking the commit log for Tulip now :-) It's exciting to imagine a future where Twisted and Tornado could easily interoperate with async support in Python 3 with a minimum of fuss. In fact, Glyph has already sketched out two classes which might be all that's needed for 2-way interoperation between Twisted and Python 3.

Here's to the future!

Monday, October 29, 2012

Async in Clojure: Playing with Agents, Part I

Clojure has a very interesting async primitive: the agent. There is some good documentation on agents, but for those that come from a background such as mine (Python at Twisted), I thought it might be nice to present one way of using agents to mimic the familiar async + callback reactive-style programming.

Do note, however, that Clojure agents run in one of two threadpools (one intended for CPU-intensive tasks, and the other for I/O-intensive tasks). As such, this is quite different than the event-loop approach that Twisted uses (or async frameworks that utilize libraries such as libevent or libev). Twisted has the deferToThread functionality, which is ... well, not exactly close, really. Regardless, let's get started.

In the following examples, we're going to pretend we have huge files we'll be reading off a local disk.

What to Call
Clojure's agent function is very, very simple: you pass it a value (it's initial state) and some options, if needed. That's it.

To update its state, you use either the send or send-off functions. If you've got CPU-bound tasks whose state you want to manage with agents, then you should use the send function. If your tasks will be I/O-bound, then you should use the send-off function for updating agent state. (The threadpool dedicated for use by send has a fixed size, based on the number of processors on your system. The threadpool for send-off is exapandable with thread caching and keep-alives.) Since our examples are focused on disk I/O, we'll be using send-off. (they have the same signature, though, so the following usage information applies to both).

When you send-off something to an agent, you pass if a few things:

an agent
the action or update function
any number of additional parameters you want the action function to consume

Here's what that looks like:

What to Write
So, we know what an agent looks like when bound and we know how we're going to send an update to the agent, but how might we construct the update itself? Perhaps like this:

As you can see, the first value that an action function takes is the "old" value of the agent -- the value that the agent has prior to the action that will take place. Once this function returns, the agent's value will be set to the return value of the action function. (What's more, if we needed to access the agent itself inside the action function for any reason, we could do so using the *agent* variable -- accessible within the scope of the action function).

Before we go on, let's take a look at this in action from the REPL:

The first thing we do is switch from the default namespace to one dedicated to our examples (this makes managing scope in the REPL much cleaner). Then we load a file that has the agent and action function defined. Then we tell it to run our fake "big read" function, asking it to run for about 10 seconds. As you can see, send-off returns the agent immediately. We then get the current value of the agent by dereferencing it. Finally our big read finishes, and we see it print how long it took. We then look at the agent directly, and then dereference it again -- both showing us what we'd expect: that the value of the agent has been updated to the return value of our big read function. Finally, we shutdown the agent threads and exit the REPL.

(The start-clojure script is wrapped with rlwrap so that I have access to a command line history, persistent over different sessions. The script boils down to this: rlwrap java -cp /usr/local/clojure-1.4.0/clojure-1.4.0.jar clojure.main.)

We've seen the agent in action now, but there's a bit more we can do. We'll take a look at that in the next post.

Thursday, October 18, 2012

Getting Started with Steel Bank Common Lisp

As some of you know, I've been a closet Lisp fan for several years. When I first joined Canonical in 2008, I was hacking on Lisp in Python, so that I could do genetic programming in Python. In fact, my first and only lightening talk at a Canonical sprint was on genetic algorithms and programming :-) (This was the same set of lightening talks that Vincenzo Di Somma gave a wonderful presentation on his photography; completely unrelated: this is one of my favorite pics of Vincenzo :-) ).

A few years later, I talked to Jim Baker about Python's AST, and how one might be able to do genetic programming by manipulating it directly, instead of running a Lisp in Python.

Throughout all this time, I've been touching in with various community projects, hacking on various Lispy Things, reading, etc., but generally doing so quite quietly. Over the past few months, however, I've really gotten into it, and Lisp has become a real force in my life, rapidly playing just as dominant a role as Python.

Similarly, MindPool has become active in several Lisp projects; as such, there are a great many things to share now. However, before I begin all that, I'd like to take an opportunity to get folks up and running with an example Lisp environment.

Future posts will explore various areas of Common Lisp, Scheme dialects, I/O loops, etc., but this one will provide a basis for all future posts that relate to Common Lisp and specifically the Steel Bank implementation.

Installing SBCL
If you don't have SBCL (Steel Bank Common Lisp; a pun on it's source parent, CMUCL), you need to install it:

For Ubuntu (12.04 LTS has 1.0.55): $ sudo apt-get install sbcl
Or you can go to the download page for everyone else.

apt-get for Lisp
Next, you'll need to install Quicklisp (as you might have surmised, it's like Debian apt-get for Common Lisp). The instructions on this page will get you up and running with Quicklisp.

I like having quicklisp available when I run SBCL, so I did the following after installing Quicklisp (and you might want to as well) from the sbcl prompt:
* (ql:add-to-init-file)

Readline Support
The default installation of SBCL doesn't have readline support for the REPL, so using your arrow keys won't give you the expected result (your command history). To remedy that, you can use a readline wrapper. First, install rlwrap:

Ubuntu: $ sudo apt-get install rlwrap
Mac OS X: $ brew install rlwrap

Then, create the chmoded 755 script ~/bin/start-sbcl with the following content (make sure that ~/bin is in your path):
rlwrap sbcl

At which point you can run the following and have access to a command history in SBCL:
$ start-sbcl
*

Why Steel Bank?
CMUCL gained an excellent reputation for being a highly performant, optimized implementation of Lisp. Based on CMUCL and continuing this tradition of excellent performance, SBCL's reputation preceded it. Over a range of different types of programs, SBCL not only compares favorably to other Lisp dialects, it seriously kicks ass all over.

SBCL comes in at 8th place in that benchmark ranking, beating out Go in 9th place. In all the languages that made it into the Top 10, I've only ever touched C, C++, Java, Scala, Lisp, and Go. In my list, SBCL made the Top 5 :-) Regardless, of all of them, Lisp has the syntax a find most pleasurable. Given my background in Python, this is not surprising ;-)

What's next?
Funny that you should ask... given my background with Twisted, I'll give you one guess ;-)

Monday, October 15, 2012

Rendering ReST with Klein and Twisted Templates

In a previous life, I spent about 25 hours a day worrying about content management systems written in Python. As a result of the battle scars built up during those days, I have developed a pretty strong aversion for a heavy CMS when a simple approach will do. Especially if the users are technologically proficient.

At MindPool, we're building out our infrastructure right now using Twisted so that we can take advantage of the super amazing numbers of protocols that Twisted supports to provide some pretty unique combined services for our customers (among the many other types of services we are providing). For our website, we're using the Bottle/Flask-inspired Klein as our micro web framework, and this uses the most excellent Twisted templating. (We are, of course, also using Twitter Bootstrap.)

Here's the rub, though: we want to manage our content in the git repo for our site with ReStructured Text files, and there's no way to tell the template rendering machinery (the flattener code) to allow raw HTML into the mix. As such, my first attempt at ReST support was rendering HTML tags all over the user-facing content.

This ended up being a blessing in disguise, though, as I was fairly unhappy with the third-party dependencies that had popped up as a result of getting this to work. After a couple false starts, I was hot on the trail of a good solution: convert the docutils-generated HTML (from the ReST source files) to Twisted Stan tags, and push those into the renderers.

This ended up working like a champ. Here's what I did:

Created a couple of utility functions for easily getting HTML from ReST and Stan from ReST.
Wrote a custom IRenderable for ReST content (not strictly necessary, but organizationally useful, given what else will be added in the future).
Updated the base class for "content" page templates to dispatch, depending upon content type.

Afterwards I was rewarded with some nicely rendered content on the staging MindPool site :-) (once the content text has been completed, we'll be pushing it live).

Kudos to David Reid for Klein and (as usual) to the Twisted community for one hell of a framework that is the engine of my internet.

Saturday, May 12, 2012

Twisted SSH: Rendering a Log-in Banner/MOTD in Conch

A few weeks ago, I pinged my peeps on #twisted asking why the banner for a custom SSH server wasn't rendering properly. After some digging around and some inconsistent results (well, consistently bad results for me), we weren't able to resolve anything, and I had to set the problem aside.

The Symptom
The first thing I had tried was subclassing Manhole from twisted.conch.manhole, overriding (and up-calling) connectionMade, writing the banner to the terminal upon successful connection. This didn't work, so I then tried overriding initializeScreen by subclassing twisted.conch.recvline.RecvLine. Also a no-go. And by "didn't work" here's what I mean:

In both Linux (Ubuntu 12.04 LTS, gnome-terminal) and Mac (OS X 10.6.8, Terminal.app), after a successful login to the Twisted SSH server, the following sequence would occur:

an interactive Python prompt was rendered, e.g., ":>>"
the banner was getting written to the terminal, and
the terminal screen refreshed with the prompt at the top

This all happened so quickly, that I usually never even saw #1 and #2. Just the second ":>>" prompt from #3. Only by scrolling up the terminal buffer would I see that the banner had actually been rendered. Even though I was doing my terminal.write after connectionMade and initializeScreen, it didn't seem to matter.

Discovery!
Some time last week, I put together example Twisted plugins showing what the problem was, and the circumstances under which a banner simply didn't get rendered. The idea was that I would provide some bare-bones test cases that demonstrated where the problem was occurring, post them to IRC or the Twisted mail list, and we could finally get it resolved. 'Cause, ya know, I really want my banners ...

While tweaking the second Twisted plugin example, I finally poked my head into the right method and discovered the issue. Here's what's happening:

twisted.conch.recvline.RecvLine.connectionMade calls t.c.recvline.RecvLine.initializeScreen
t.c.recvline.RecvLine.initializeScreen does a terminal.reset, writes the prompt, and then switches to insert mode. But this is a red herring. Since something after initializeScreen is causing the problem, we really need to be asking "who's calling connectionMade?"
t.c.manhole_ssh.TerminalSession.openShell is what kicks it off when it calls the transportFactory (which is really TerminalSessionTransport)
openShell takes one parameter, proto -- this is very important :-)
openShell instantiates TerminalSessionTransport
TerminalSessionTransport does one more thing after calling the makeConnection method on an insults.ServerProtocol instance (the one I had tried overriding without success), and as such, this is the prime suspect for what was preventing the banner from being properly displayed: it calls chainedProtocol.terminalProtocol.terminalSize
chainedProtocol is an insults.ServerProtocol instance, and its terminalProtocol attribute is set when ServerProtocol.connectionMade is called.
A quick check reveals that terminalProtocol is none other than the proto parameter passed to openShell.

But what is proto? Some debugging (and the fact that of the three terminalSize methods in all of twisted, only one is an actual implementation) reveals that proto is a RecvLine instance. Reading that method uncovers the culprit in our whodunnit: the first thing the method does is call terminal.eraseDisplay.

Bingo! (And this is what I was referring to above when I said "poked my head" ...)

Since this was called after all of my attempts to display a banner using both connectionMade and initializeScreen, there's no way my efforts would have succeeded.

Here's What You Do
How do you get around this? Easy! Subclass :-)

The class TerminalSessionTransport in t.c.manhole_ssh is the bad boy that calls terminalSize (which calls eraseDisplay). It's the last thing that TerminalSessionTransport does in its __init__, so if we subclass it, and render our banner at the end of our __init__, we should be golden. And we are :-)

You can see an example of this here.

Not sure if this sort of thing is better off in projects that make use of Twisted, or if it would be worth while to add this feature to Twisted itself. Time (and blog comments) will tell.

Epilogue
As is evident from the screenshot above (and the link), this feature is part of the DreamSSH project. There are a handful of other nifty features/shortcuts that I have implemented in DreamSSH (plus some cool ones that are coming) and I'm using them in projects that need a custom SSH server. I released the first version of DreamSSH last night, and there's a pretty clear README on the github project page.

One of the niftier things I did last night in preparation for the release was to dig into Twisted plugins and override some behaviour there. In order to make sure that the conveniences I had provided for devs with the Makefile were available for anyone who had DreamSSH installed, I added subcommands... but if the service was already running, these would fail. How to work around that (and other Twisted plugin tidbits) are probably best saved for another post, though :-)

Saturday, March 24, 2012

A Conversation with Guido about Callbacks

In a previous post, I promised to share some of my PyCon conversations from this year -- this is the first in that series :-)

As I'm sure many folks noticed, during Guido van Rossum's keynote address at PyCon 2012, he mentioned that he likes the way that gevent presents asynchronous usage to developers taking advantage of that framework.

What's more, though, is that he said he's not a fan of anything that requires him to write a callback (at which point, I shed a tear). He continued with: "Whenever I see I callback, I know that I'm going to get it wrong. So I like other approaches."

As a great lover of the callback approach, I didn't quite know how to take this, even after pondering it for a while. But it really intrigued me that he didn't have the confidence in being able to get it right. This is Guido we're talking about, so there was definitely more to this than met the eye.

As such, when I saw Guido in the hall at the sprints, I took that opportunity to ask him about this. He was quite generous with his time and experiences, and was very patient as I scribbled some notes. His perspective is a valuable one, and gave me lots of food for thought throughout the sprints and well into this week. I've spent that intervening time reflecting on callbacks, why I like them, how I use them, as well as the in-line style of eventlet and gevent [1].

The Conversation

I only asked a few initial questions, and Guido was off to the races. I wanted to listen more than write, so what I'm sharing is a condensed (and hopefully correct!) version of what he said.

The essence is this: Guido developed an aesthetic for reading a series of if statements that represented async operations, as this helped him see -- at a glance -- what the overall logical flow was for that block of code. When he used the callback style, logic was distributed across a series of callback functions -- not something that one can see at a glance.

However, more than the ability to perceive the intent of what was written with a glance is something even more pragmatic: the ability to avoid bugs, and when they arise, debug them clearly. A common place for bugs is in the edge cases, and for Guido those are harder to detect in callbacks than a series of if statements. His logic is pretty sound, and probably generally true for most programmers out there.

He then proceded to give more details, using a memcache-like database as an example. With such a database, there are some basic operations possible:

check the cache for a value
get the value if present
add a value if not present
delete a value

At first approach, this is pretty straight-forward for both approaches, with in-line yielding code being more concise. However, what about the following conditions? What will the code look like in these circumstances?

an attempt to connect to the database failed, and we have to implement reconnecting logic
an attempt to get a lock, but a key is already locked
in the case of a failed lock, do re-trys/backoff, eventually raise an exception
storing to multiple database servers, but one or more might not contain updated data
this leaves the system in an inconsistent state and requires a all sorts of checking, etc.

I couldn't remember all of Guido's excellent points, so I made some up in that last set of bullets, but the intent should be clear: each of those cases requires code branching (if statements or callbacks). In the case of callbacks, you end up with quite a jungle [2]... a veritable net of interlacing callbacks, and the logic can be hard to follow.

One final point that Guido made was that batching/pooling is much simpler with the in-line style, a point I conceded readily.

A Tangent: Thinking Styles

As mentioned already, this caused me to evaluate closely my use of and preference for callbacks. Should I use them? Do I really like them that much? Okay, it looks like I really do -- but why?

Meditating on that question revealed some interesting insights, yet it might be difficult to convey -- please leave comments if I fail to describe this effectively!

There are many ways to describe how one thinks, stores information in memory, retrieves data and thoughts from memory, and applies these to the solutions of problems. I'm a visual thinker with a keen spacial sense, so my metaphors tend follow those lines, and when reflecting on this in the context of using and creating callbacks, I saw why I liked them:

The code that I read is just a placeholder for me. It happens to be the same thing that the Python interpreter reads, but that's a happy accident [3]; it references the real code... the constructs that live in my brain. The chains of callbacks that conditionally execute portions of the total-possible-callbacks net are like the interconnected deer paths through a forest, like the reticulating sherpa trails tracing a high mountain side, like the twisty mazes of an underground adventure (though not all alike...).

As I read the code, my eyes scan the green curves and lines on a black background and these trigger a highly associative memory, which then assembles a landscape before me, and it's there where I walk through the possibilities, explore new pathways, plan new architectures, and attempt to debug unexpected culs-de-sac.

Even stranger is this: when I attempt to write "clean" in-line async code, I get stuck. My mental processes don't fire correctly. My creative juices don't flow. The "inner eye" that looks into problem spaces can't focus, or can't get binocular vision.

The first thing I do in such a situation? Figure out how I can I turn silly in-line control structures into callback functions :-) (see footnote [1]),

Now What?

Is Guido's astute assessment the death of callbacks? Well, of course not. Does it indicate the future of the predominant style for writing async Python code? Most likely, yes.

However, there are lots of frameworks that use callbacks and there are lots of people that still prefer that approach (including myself!). What's more, I'd bet that the callbacks vs. in-line async style comes down to a matter of 1) what one is used to, and possibly, 2) the manner in which one thinks about code and uses that code to solve problems in a concurrent, event-driven world.

But what, as Guido asked, am I going to do with this information?

Share it! And then chat with fellow members of the Twisted community. How can we better educate newcomers to Twisted? What best practices can we establish for creating APIs that use callbacks? What patterns result in the most readable code? What patterns are easiest to debug? What is the best way to debug code comprised of layers of callbacks?

What's more, we're pushing the frontiers of Twisted code right now, exploring reactors implemented on software transaction memory, digging through both early and recent research on concurrency and actor models, exploring coroutines, etc. (but don't use inlineCallbacks! Sorry, radix...). In other words, there's so much more to Twisted than what's been created; there's much more that lies ahead of us.

Regardless, Guido's perspective has highlighted the following needs within the Twisted community around the callback approach to writing asynchronous code:

education
establishing clear best practices
recording and publicizing definitive design patterns
continued research

These provide exciting opportunities for big-picture thinkers for both those new to Twisted, as well as the more jaded old-timers. Twisted has always pushed the edge of the envelope (in more ways than one...), and I see no signs of that stopping anytime soon :-)

Footnotes

[1] In a rather comical twist of fate, I actually have a drafted blog post on how to write gevent code using its support for callbacks :-) The intent of that post will be to give folks who have been soaked in the callback style of Twisted a way of accepting gevent into their lives, in the event that they have such a need (we've started experimenting with gevent at DreamHost, so that need has arisen for me).

[2] There's actually a pretty well-done example of this in txzookeeper by Kapil Thangavelu. Kapil defined a series of callbacks within the scope of a method, organizing his code locally and cleanly. As much as I like this code, it is probably a better argument for Guido's point ;-)

[3] Oh, happy accident, let me count the hours, days, and weeks thy radiant presence has saved me ...

Saturday, March 17, 2012

Python for iOS

I do a lot of traveling, and I don't always like to lug my laptop around with me. Even when I do, I'd rather leave it in the bag unless I absolutely need to get it out (or if I'm setting up my mobile workspace). As such, I tend to use my iPhone for just about everything: reading, emails, calendar, etc.

So, imagine my delight, when I found out (just after PyCon this year) that I can now run Python 2.7.2 on my iPhone (and, when I get it, my iPad 3 ;-) ). This is just too cool for words... and given what pictures are worth, I'll use those instead :-)

I've put together a small Flickr set that highlights some of the functionality offered in this app, and each image in the set describes a nifty feature. For the image-challenged, here's a quick list:

an interactive Python prompt for entering code directly using the iPhone keyboard
a secondary, linear "keyboard" that one can use in conjunction with the main keyboard, extending one's ability to type faster
multiple options for working with/preserving one's code (email, saving to a file, viewing command history)

I can't even begin to count the number of times such an awesome Python scratchpad would have come in handy. And now we have it :-) At $2.99, this is a total steal.

Thanks Jonathan Hosmer!

(And thanks to David Mertz for pointing it out to folks on a Python mail list.)

Friday, March 16, 2012

PyCon 2012: To Be Continued

PyCon was just fabulous this year.

It's been a couple years since I was able to go, and I was quite surprised by how much I had been missing it. The Python community is not only one of the most technically astute and interesting ones to which I belong, but also the kindest. That last point is so incredibly important, and it ends up fostering a very strong familiar sense amongst its members.

There were so many good conversations with such great people: Anna, Alex, Guido, David (Mertz), Donovan, JP, Maciej, Allen, Glyph, Paul, Sean... the list goes on and on! Fortunately, I took notes and (and even have some book recommendations to share!) so there are many blog posts to come :-)

But this has brought something into focus quite strongly for me: the interaction at PyCon is one of the most fertile grounds for me all year -- and going without it since Chicago has been a genuine drought! There were some folks at DreamHost that couldn't make it, and we've already started looking around at various local, mini Python conferences that we can attend. This was initially so that those who couldn't make PyCon could receive similar benefits. But now there's something equally important that's contributing to the importance of this search: attending local conferences will mean not as much time has to pass between those fertile interactions and that recharging that we give each other at such events.

Until next time, I hope all Pythonistas everywhere are getting ready for a great weekend :-) Those who have been traveling, I hope you get lots of rest and share with everyone the treasures gathered at this year's PyCon :-)

Monday, March 12, 2012

OpenStack at PyCon 2012 Sprints!

This is just a short post to give a shout out to some folks who are sprinting for OpenStack this year at PyCon. It's a small group, since the Folsom Design Summit and Conference is coming up in a few weeks.

One big surprise came last night when I got an email about Cisco's recent work with Layer 3 (blueprint) support in Quantum, and there were two Cisco folks here this morning to chat about that. Mark McClain (DreamHost) is digging deep into their work right now.

Yahoo! is remote-sprinting today, and they hope to be in the house tomorrow, to continue working on current improvements in DevstackPy. Mike Pittaro (La Honda Research), Jonathan LaCour and Doug Hellmann (DreamHost) are working with Yahoo! on that.

Mike Perez (DreamHost) is hacking on some additional improvements in Horizon for different storage backend representations. We've also chatted a bit about the latest efforts in Horizon for Quantum support (Michael Fork's work). Perez is also helping out tracking some bugs down in DevstackPy.

Special thanks to Mike Pittaro for improving the sprinting pages on the OpenStack wiki with links to previous work and discussions!

If you're keen on OpenStack and would like to dive in with some fellow hackers into the deep ends of Nova, Quantum, or Horizon, be sure to come by or pop in at #openstack-pycon on Freenode :-)

Tuesday, June 21, 2011

txStatsD Preview

Sidnei da Silva (of Plone fame) has recently created a Launchpad project for an async StatsD implementation. He's got code in place for review by any Twisted kingpins who'd like to give it a glance.

statsD was originally created in 2008 as a Perl implementation at Flickr for their statistics counting, timing, and graphing needs. Engineers at Etsy ported this work to Node.js (which Sidnei based his version on). A few months ago a regular Python implementation was created (also based on Node.js).

More than another (excellent) addition to the tx family, txStatsD will provide folks with the luxury of collecting stats using a Python server without having to write any blocking code :-) Sidnei also implemented a graphite protocol and client factory for passing the messages along.

Enjoy, and let him know what you think!

Monday, May 23, 2011

Packt: A Publishing House for the Future

Since I first heard of them several years ago, I've viewed Packt as the underdog in the world of technical book publishing. In the past year or so, Packt seems to have gained greater and greater influence: their catalog continues to grow, they are attracting talented and knowledgeable engineers as authors, and their titles are things that I'm actually interested in.

Two examples of this are the books Expert Python Programming and Zenoss Core Network and System Monitoring. I received a copy of the former and blogged about my take on it. For the Zenoss book, last year I agreed to be a technical reviewer and am currently preparing a blog post on my pre- and post-publishing experiences.

In both cases, I agreed to work with Packt based solely on the technical merits of their works. However, my experience as a technical reviewer with them was so positive (I have had consistently excellent experiences with their staff over extended periods of time and on long-running conversations) that I have not only agreed to review more titles, but have read up on Packt themselves a bit. Here are some highlights from their wikipedia article:

They published their first book in 2004 (the same year Ubuntu started!).
Packt offers PDF versions of all of their books for download.
When a book written on an open source project is sold, Packt pays a royalty directly to that project.
As of March 2008, Packt's contributions to open source projects surpassed US $100,000 (I would love an updated stat on this, if anyone has a newer figure).
They went DRM-free in March 2009.
Packt supports and publishes books on smaller projects and subjects that standard publishing companies cannot make profitable.
Their stream-lined business model aims to give authors high royalty rates and the opportunity to write on topics that standard publishers tend to avoid.
Bonus: they also run the Open Source Content Management System Award.

These guys have some keys things going for them:

They've got what appears to be a lean approach to business.
They know how to effectively crowd-source, keeping their overhead low.
They are rewarding both the authors as well as the open source projects.
Their titles continue to grow in diversity and depth.
The have an outstanding staff.

Oh, and I really like the user account management in their website! When I log in, I see a list of owned books, source code links for them, clear/clean UI, very easy to navigate. I can't emphasize this enough to vendors, service providers, etc.: if you want a loyal user base:

make a good product that lasts a long time;
make simple and great tools that enhance the experience of those products, that truly improve the experience of your users.

All in all, Packt really appear to be leaders in publishing innovation, taking lessons learned from the frontier of open source software and applying that to the older industry of publication production. I would encourage folks to evaluate Packt for themselves: if you like what you see, support them in readership and authorship :-)

I, for one, will continue to review titles that appeal to me personally and that I think others would enjoy as well. I have two books in the queue and three pending blog posts for the following titles:

And who knows, if I feel like writing a technical book at some point, you may see me in the Packt catalog, too ;-)

Sunday, November 07, 2010

Canonical and Codethink at Bostom GNOME Summit

Today is the second day of the Boston GNOME Summit, and the second day of Canonical providing morning sustenance for the hackers here. Codethink and Canonical coordinated these efforts, with Codethink sponsoring food later today. It warms my heart that we can do this sort of thing.

Yesterday Cody Russell and I held a session about getting a gesture API into GTK 3.x. There were a great many questions about the uTouch framework, how we're handling multi-touch in the absence of MT support in X (coming in XInput 2.1), and what sort of dependencies would be needed (none! if GEIS is present on the system, gesture support will be added at build-time). At the end of the session, there was a consensus for Cody to present his plans to the GTK developers list and then to start getting branches reviewed for merge. We're hoping to make it for GTK 3.2.

In this vein, Cody and I have been hacking on libgrope for GTK compatibility, and this is serving as the sandbox for the GTK 3 gesture API development. My efforts have been focused on creating the GTK 2 Python C extension for grope. Given that the last time I coded C was in 1989 (and then a bit later in the mid-90s, when I had to hack a slackware driver to get ethernet working), this has been quite an effort. However, after a night and morning of hacking, I've got a handle on C extensions and am using the example code I wrote as the basis for pygrope. I've even managed to rope Barry Warsaw into reviewing the C extension code for us, to be sure we're not doing anything too crazy :-)

The Python C extension will be of immediate use to us in our test harness for gestures and exercising the stack. We will be creating a GTK app for recording user gestures for later playback and inclusion in test suites.

Pages