A documentation generator

So you’ve written your amazing Python program which converts a Twitter feed into youtube videos or whatever and being a Good Programmer you hack together some nicely-formatted documentation for the API so that people can actually use it. Eventually you get bored of maintaining both this & the docstrings in the code and think there must be a way of producing the docs from the docstrings. A search of the standard library will turn up pydoc, but … have you seen its output?!

Further research led me to epydoc, which actually is pretty cool. However I eventually started to bang my head against some issues:
– Using epytext for markup you can’t include tables or images.
– You can if you use rst, although this needs docutils installed & a patch applying (thank you stackoverflow!).
– More problematically with rst you can’t link “thing.foo” as “something_else”.
– With rst you still have to give full image paths relative to the output dir, not relative to the module the docstring is in.
– Epydoc by default will try to import the module/package you’re documenting which for me was a problem, mostly because it then imported tkinter which took AGES.
– You can turn ‘introspection’ off, so it just parses rather than imports, but then docs for @property are not very clear or missing.
– Even with just parsing it was waaay too slow on my package; having to wait a couple of minutes to find out whether you’ve corrected a typo in your markup is just too painful IMO.
– The output is not very modifiable; changing the css allows some improvement of the default style but you can’t do anything about content & layout.

So what are the other options? Pydoc also uses importing to do its stuff, so that’s out even if we can fix the output. Sphinx is probably the “professional” approach but it won’t “just work”, seeming to need the autodoc extension & quite a bit of effort to get it to build docs from docstrings. Other negative points against it are:
– Seems like a sledgehammer to crack a nut.
– It’s used to produce docs for both Python itself & matplotlib but I find both of these quite hard to navigate as they don’t have a TOC for the current page which stays fixed in your browser.

Building your own is bad. But I thought I’d learn a lot so I’d have a go. To bring some sort of rigour to it here are the goals:
– Produce HTML to document the API of standalone modules or complete packages using their docstrings.
– Document modules, classes (with proper formatting of the constructor), methods, functions, properties, instance variables.
– Very very fast – just parse files, no importing.
– Allow easy linking to python objects in the package/module, with arbitrary link text.
– Support inheritance within a package, but not necessarily outside it.
– Provide beautiful & clear html output.
– Make it very easy to modify the structure of output
– Ditto for the format of output.
– Allow incremental building; when tweaking markup in a module it shouldn’t be necessary to rebuild the whole package. It’s ok to have to do this at the end for indices etc though.
– Support embedding images with paths relative to the module.
– Support boxed & coloured warning/notes text.
– Support hideable ‘more’ items.

Approach:
– Use ast to parse module sourcecode and extract definitions and docstrings.
– Use RST as the markup for docstrings & docutils to convert these to html; I don’t massively like rst but it does have the required functionality and it looks like this conversion is quite hard. Ideally I’d like to be fully-compatible with epydoc using rst so that there is an easy changeover.
– Convert each module in a single pass, outputting documentation as we go rather than reading it, grouping objects, then producing output. I’m not sure whether this was the best decision, it felt faster but actually it might not save much time. The two effects of this are that 1) documentation occurs in the same order as the objects in the code, which actually I think is a good thing, and 2) there is no way to check that linked objects etc exist. But actually 2) means that incremental building works, so that’s not necessarily a bad thing.

Note: I wrote this post, didn’t post it, started writing code, then got bogged down inside `docutils` which is what I was using to process rST. The ast parsing also got really complex and appeared it’d get even more complex as it went on.. I did however come up with a semi-neat recursive template formatter.
Subsequent research uncovered pdoc, which is /almost/ what I want except that:
– it doesn’t do anything special with the docstrings (e.g. linking)
– it imports modules, I think, which means you can’t use it for scripts intended for programs embedding Python interpreters (as names will be missing)
– to document packages (if they work at all) the have to be on Python’s path (why).
– there’s still no easy templating/control over output format
I did however submit a PR which fixed Windows installs.

Having reviewed the above I think the goals are still valid actually, except now I’d probably target Markdown and google-style docstrings. Although that does show that really both the docstring style and the markup language need to be plugins, probably.

This entry was posted in Uncategorized. Bookmark the permalink.

1 Response to A documentation generator

  1. Pingback: GUIs from thin air (or rather, docstrings) | pythonideas

Leave a comment