Refactoring Functional Programs

Refactoring Functional Programs Simon Thompson with Huiqing Li Claus Reinke www.cs.kent.ac.uk/projects/refactor-fp

Session 2

Overview • Review mini-project. • Implementation of HaRe. • Larger-scale examples. • Case study.

Mini-project feedback • Refactorings performed. • Refactorings and language features? • Machine support feasible? Useful? • ‘Not-quite’ refactorings? Support possible here?

Examples • Argument permutations (NB partial application). • (Un)group arguments. • Slice function for a component of its result. • Error handling / exception handling.

More examples • Introduce type synonym, selectively. • Introduce ‘branded’ type. • Modify the return type of a function fromTtoMaybeT, Either T S, [T]. • Ditto for input types … and modify variable names correspondingly.

Implementing HaRe

Proof of concept … • To show proof of concept it is enough to: • build a stand-alone tool, • work with a subset of the language, • pretty print the results of refactorings.

… or a useful tool? • Integrate with existing program development tools: stand-alone program links to editors emacs and vim, any other IDEs also possible. • Work with the complete language: Haskell 98? • Preservetheformattingand commentsin the refactored source code. • Allow users to extend and script the system.

The refactorings in HaRe • Rename • Delete • Lift / Demote • Introduce definition • Remove definition • Unfold • Generalise • Add / remove params Move def between modules Delete /add to exports Clean imports Make imports explicit Data type to ADT All these refactorings are module aware.

The Implementation of HaRe Information gathering Pre-condition checking Program transformation Program rendering

Information needed • Syntax: replace the function called sq, not the variable sq…… parse tree. • Static semantics: replace this function sq, not all the sqfunctions …… scope information. • Module information: what is the traffic between this module and its clients …… call graph. • Type information: replace this identifier when it is used at this type …… type annotations.

Infrastructure: decisions • Build a tool that caninteroperatewith emacs, vim, … yet actseparately. • Leverage existing libraries for processing Haskell 98, for tree transformation … as few modifications as possible. • Be as portable as possible, in the Haskell space. • Abstract interface to compiler internals?

Haskell landscape (end 2002) • Parser: many • Type checker: few • Tree transformations: few • Difficulties • Haskell 98 vs. Haskell extensions. • Libraries: proof of concept vs. distributable. • Source code regeneration. • Real project

Programatica • Project at OGI to build a Haskell system … • … with integral support for verification at various levels: assertion, testing, proof etc. • The Programatica project has built a Haskell front end in Haskell, supporting syntax, static, type and module analysis … • … freely available under BSD licence.

First steps … lifting and friends • Use the Haddock parser … full Haskell given in 500 lines of data type definitions. • Work by hand over the Haskell syntax: 27 cases for expressions … • Code for finding free variables, for instance …

Finding free variables ‘by hand’ • instance FreeVbls HsExp where • freeVbls (HsVar v) = [v] • freeVbls (HsApp f e) • = freeVbls f ++ freeVbls e • freeVbls (HsLambda ps e) • = freeVbls e \\ concatMap paramNames ps • freeVbls (HsCase exp cases) • = freeVbls exp ++ concatMap freeVbls cases • freeVbls (HsTuple _ es) • = concatMap freeVbls es • … etc.

This approach • Boilerplate code … 1000 lines for 100 lines of significant code. • Error prone: significant code lost in the noise. • Want to generate the boiler plate and the tree traversals … • … DriFT: Winstanley, Wallace • … Strafunski: Lämmel and Visser

Strafunski • Strafunski allows a user to write general (read generic), type safe, tree traversing programs, with ad hoc behaviour at particular points. • Top-down / bottom up, type preserving / unifying, full stop one

Strafunski in use • Traverse the tree accumulating free variables from components, except in the case of lambda abstraction, local scopes, … • Strafunski allows us to work within Haskell … • Other options? Generic Haskell, Template Haskell, AG, …

Rename an identifier • rename:: (Term t)=>PName->HsName->t->Maybe t • rename oldName newName =applyTPworker • where • worker =full_tdTP(idTP‘adhocTP‘idSite) • idSite :: PName -> Maybe PName • idSite v@(PN name orig) • | v == oldName • = return (PN newName orig) • idSite pn = return pn

The coding effort • Transformations: straightforward in Strafunski … • … the chore is implementing conditions that the transformation preserves meaning. • This is where much of our code lies.

Move f from module A to B • Is fdefined at the top-level of B? • Are the free variables in faccessible within module B? • Will the move require recursive modules? • Remove the definition of f from module A. • Add the definition to module B. • Modify the import/export lists in module A, Band the client modules ofAandB if necessary. • Change uses of A.f to B.f or f in all affected modules. • Resolve ambiguity.

Program rendering example • -- This is an example • module Main where • sumSquares x y = sq x + sq y • where sq :: Int->Int • sq x = x ^ pow • pow = 2 :: Int • main = sumSquares 10 20 • Promote the definition of sq to top level

Program rendering example • module Main where • sumSquares x y • = sq powx + sq powy where pow = 2 :: Int • sq :: Int->Int->Int • sq pow x = x ^ pow • main = sumSquares 10 20 • Using a pretty printer: comments lost and layout quite different.

Program rendering example • -- This is an example • module Main where • sumSquares x y = sq x + sq y • where sq :: Int->Int • sq x = x ^ pow • pow = 2 :: Int • main = sumSquares 10 20 • Promote the definition of sq to top level

Program rendering example • -- This is an example • module Main where • sumSquares x y = sq powx + sq powy • where pow = 2 :: Int • sq :: Int->Int->Int • sq pow x = x ^ pow • main = sumSquares 10 20 • Layout and comments preserved.

Token stream and AST • White space and comments in the token stream. • Modification of the AST guides the modification of the token stream. • After a refactoring, the program source is extracted from the token stream not the AST. • Heuristics associate comments with program entities.

Production tool Programatica parser and type checker Refactor using a Strafunski engine Render code from the token stream and syntax tree.

Production tool (optimised) Programatica parser and type checker Refactor using a Strafunski engine Render code from the token stream and syntax tree. Pass lexical information to update the syntax tree and so avoid reparsing

What have we learned? • Emerging Haskell libraries make it practical(?) • Efficiency and robustness • type checking large systems, • linking, • editor script languages (vim, emacs). • Limitations of editor interactions. • Reflections on Haskell itself.

Reflections on Haskell • Cannot hide items in an export list (cf import). • Field names for prelude types? • Scoped class instances not supported. • ‘Ambiguity’ vs. name clash. • ‘Tab’ is a nightmare! • Correspondence principle fails …

Correspondence • Operations on definitions and operations on expressions can be placed in one to one correspondence • (R.D.Tennent, 1980)

Definitions where f x y = e f x | g1 = e1 | g2 = e2 Expressions let \x y -> e f x = if g1 then e1 else if g2 … … Correspondence

f x | g1 = e1 f x | g2 = e2 Can ‘fall through’ a function clause … no direct correspondence in the expression language. f x = if g1 then e1 else if g2 … No clauses for anonymous functions … no reason to omit them. Function clauses

Work in progress • ‘Fold’ against definitions … find duplicate code. • All, some or one? Effect on the interface … • f x = … e … e … • Traditional program transformations • Short-cut fusion • Warm fusion

Where next? • Opening up to users: API or little language? • Link with other IDEs (and front ends?). • Detecting ‘bad smells’. • More useful refactorings supported by us. • Working without source code.

API Refactorings Refactoring utilities Strafunski Haskell

DSL Combining forms Refactorings Refactoring utilities Strafunski Haskell

Larger-scale examples • More complex examples in the functional domain; often link with data types. • Dawning realisation that can some refactorings are pretty powerful. • Bidirectional … no right answer.

Algebraic or abstract type? • data Tr a • = Leaf a | • Node a (Tr a) (Tr a) • flatten :: Tr a -> [a] • flatten (Leaf x) = [x] • flatten (Node s t) • = flatten s ++ • flatten t • Tr • Leaf • Node

Algebraic or abstract type? • Tr • isLeaf • isNode • leaf • left • right • mkLeaf • mkNode • data Tr a • = Leaf a | • Node a (Tr a) (Tr a) • isLeaf = … • isNode = … • … • flatten :: Tr a -> [a] • flatten t • | isleaf t = [leaf t] • | isNode t • = flatten (left t) • ++ flatten (right t)

 Pattern matching syntax is more direct … … but can achieve a considerable amount with field names. Other reasons? Simplicity (due to other refactoring steps?).  Allows changes in the implementation type without affecting the client: e.g. might memoise Problematic with a primitivetype as carrier. Allows an invariant to be preserved. Algebraic or abstract type?

Outside or inside? • Tr • isLeaf • isNode • leaf • left • right • mkLeaf • mkNode • data Tr a • = Leaf a | • Node a (Tr a) (Tr a) • isLeaf = … • isNode = … • … • flatten :: Tr a -> [a] • flatten t • | isleaf t = [leaf t] • | isNode t • = flatten (left t) • ++ flatten (right t)

Outside or inside? • Tr • isLeaf • isNode • leaf • left • right • mkLeaf • mkNode • flatten • data Tr a • = Leaf a | • Node a (Tr a) (Tr a) • isLeaf = … • isNode = … • flatten t = …

 If inside and the type is reimplemented, need to reimplement everything in the signature, including flatten. The more outside the better, therefore.  If inside can modify the implementation to memoise values of flatten, or to give a better implementation using the concrete type. Layered types possible: put the utilities in a privileged zone. Outside or inside?

Memoise flatten :: Tr a->[a] • data Tree a • = Leaf { val::a } | • Node { val::a, • left,right::(Tree a) } • leaf = Leaf • node = Node • flatten (Leaf x) = [x] • flatten (Node x l r) = • (x : (flatten l ++ flatten r)) • data Tree a • = Leaf { val::a, • flatten:: [a] } | • Node { val::a, • left,right::(Tree a), • flatten::[a] } • leaf x = Leaf x [x] • node x l r • = Node x l r (x : (flatten l ++ • flatten r))

Memoise flatten • Invisible outside the implementation module, if tree type is already an ADT. • Field names in Haskell make it particularly straightforward.

Refactoring Functional Programs