Another day, another project. Today I bring you Yagni: a static code analyzer for Clojure, designed to find the parts of your codebase that aren't in use.

Background

Given the general acceptance of "You aren't going to need it" as a principle, it seemed reasonable to want an automated way of identifying vestigial features. Since we're in lisp-land, code and data share the same abstract syntax tree - why not leverage that strength to programatically traverse a codebase and see which parts are unused?

Per Martin Fowler's article on the subject, this plugin won't get rid of the cost of building or repairing an unnecessary feature, but it can at least eliminate the cost of carry by letting you know that you can remove the pertinent section[s] of code without fear of repurcussion.

How It Works

Yagni works by first walking all of the findable namespaces within your project's :source-paths, and identifying all of the interned vars within those namespaces. Like Eastwood, it will require your codebase's namespaces, so if importing a namespace fires the missiles, this will fire the missiles.

Pro tip: don't have import side effects.

Once Yagni has identified all of the interned vars in your project's namespaces, it walks the forms of those vars (Clojure, being a Lisp, makes this relatively easy). As Yagni walks the forms, it looks for symbols that resolve to the known interned vars, and builds a directed graph of references between those vars.

It then repeatedly searches the graph, starting each time from one of a set of entrypoints. Entrypoints here should be thought of as the things that someone would invoke from outside your codebase - if we're talking about an application, you could think of a main method, while if we're talking about a library, a set of functions or a public API would be a better mental model.

By default, Yagni assumes the only entrypoint is the :main value in your project.clj, but you can include other entrypoints by listing them as namespace-qualified var names in a .lein-yagni file in your project's root directory.

As Yagni searches, it prunes the nodes it finds from the graph, ultimately leaving behind a subgraph of the original where the only remaining nodes are vars that were unreachable from the provided entry points.

From here, Yagni reports on these "orphaned" vars as being in one of two classes: children and parents.

A child in Yagni is a function that is called somewhere within the codebase, but there's no actual path from an entrypoint to it.

By contrast, a parent in Yagni is called by nothing, anywhere. It has no inbound references at all.

As the definitions of parents and children should suggest, parents call children, and not the other way around. Attempting to remove a child from your codebase without first making sure that its parents have been removed will cause your program to fail to compile (since the reference from parent to child will still exist), so if you decide to prune your codebase, it's a good idea to begin with the parents.

Weaknesses

Everybody's a critic. Including me.

As you might imagine, while Yagni represents a reasonable approach to the problem at hand, it's also easily tricked. Circumventing some of those tricks is within the realm of possibility, while addressing the others is a much harder problem. Still, here are a few known weaknesses with the approach I've taken.

eval, read-string, et al - ah, yes; the bane of all static analyzers...eval. If you're dynamically creating code from some external input, or input that is only valid at runtime, there's just no easy way for Yagni to know what you're going to ask it to do.

The same goes for non-idiomatic type usage - if you're referencing vars by casting strings, keywords, hex, bytecode, etc. to symbols before turning them into vars, Yagni isn't going to know what's intended to be a reference and what's not. It assumes that Clojure's idiomatic usage of symbols as references to vars is the only one it needs to concern itself with.

branch logic - let's consider a somewhat malicious example: (defn foo [] (when false (bar))). Although (when false ... doesn't come up that often in practice, hopefully you get the idea: there's a single deterministic outcome here (unless you're redefining false, in which case you're a monster), and it's that bar isn't going to get called.

Yagni's form-walker is naive to this sort of branch logic, and assumes the fact that a reference to bar exists within foo means bar's worth holding onto. And on some level, it's right - attempting to remove bar would cause the compiler to fail. Even so, there's an argument to be made for some smarter branching logic in the walker.

multi-arity functions - another straw man: (defn x ([] foo) ([y] bar)). If x is only ever called with no arguments, then the reference to bar is unimportant. As before, the same caveat exists with regard to compiler exceptions.

non-interned var references - as a general statement, this is probably more your problem than mine (see the pro tip, above). That having been said - if you've got inlined code that isn't sitting in an interned var, Yagni will never know about it, nor any references it makes to the rest of your program.

Roadmap

I believe the project in its current form and version (0.1.1) satisfy the bare minimum for the stated objectives of the project. Even so, there are aspects to it that are interesting but currently under-leveraged. For instance:

Further Graph Analytics

Given that Yagni goes to all the trouble of constructing a graph of the program's references, it feels like there's quite a bit more there that could be of interest. For instance, with regards to the orphaned vars, which parents call which children?

I'm not yet sure I have strong preferences about how such information should be reported, but I'm interested in exploring it further.

REPL Invocation

I'm generally a fan of being able to invoke core Leiningen plugin logic from the REPL - in the past I've leveraged Eastwood and Cljfmt's abilities in this regard to write vim plugins, and while I'm not sure I'd want to write a vim plugin for Yagni I can certainly see the value in being able to call it from the REPL easily.

Namespace Re-writing

For the truly lazy, some Slamhound-style functionality could be very cool - something to simply prune unused code for you as required, re-writing the relevant namespaces in the process. Potentially dangerous, but fun nonetheless.

Conclusion

As should be well established by now, I'm very enthusastic about development tools. In an ideal world, I'd hope that my fellow Clojurists find Yagni to be as valuable as their current linter - the sort of thing that should be run as part of your continuous integration process for any branch that's a contender to be merged into master.

Whether or not that actually happens, I'm looking forward to finding ways to make Yagni even better. In that vein, I'm excited to see what issues, pull requests, and features people create and are interested in.

As always, I'll do my best not to bite.

Discuss this post on Hacker News or on Twitter (@venantius)

Thanks to Bill Cauchois (@wcauchois), Chris Dean (@ctdean), Keith Ballinger (@keithba) and Allen Rohner (@arohner) for providing early feedback on Yagni.