Skip to content

Commit

Permalink
Updating design information
Browse files Browse the repository at this point in the history
  • Loading branch information
grantnelson-wf committed Sep 9, 2024
1 parent e6a483d commit 5494656
Show file tree
Hide file tree
Showing 2 changed files with 187 additions and 25 deletions.
203 changes: 178 additions & 25 deletions compiler/internal/dce/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,22 @@ alive is unreachable from the entry point, unused, and considered dead.
The dead-code may be safely eliminated, i.e. not outputted to the JS file(s).

- [Idea](#idea)
- [Package](#package)
- [Named Types](#named-types)
- [Named Structs](#named-structs)
- [Interfaces](#interfaces)
- [Functions](#functions)
- [Variables](#variables)
- [Generics and Instances](#generics-and-instances)
- [Implementation](#implementation)
- [Design](#design)
- [Initially Alive](#initially-alive)
- [Naming](#naming)
- [Dependencies](#dependencies)
- [Examples](#examples)
- [Dead Package](#dead-package)
- [Grandmas and Zombies](#grandmas-and-zombies)
- [Side Effects](#side-effects)
- [Additional Notes](#additional-notes)

## Idea

Expand All @@ -29,6 +35,18 @@ is used since some conditions are difficult to determine even with a lot of
additional information.
We bias towards code being alive to ensure a functional result.

### Package

Package declarations (e.g. `package foo`). Packages might be able to be removed
when only used by dead-code. However, packages may be imported and not used
for various reasons including to invoke some initialization or to implement
a link. So it is difficult to determine.
See [Dead Package](#dead-package) example.

Currently (go1.20), we won't remove any packages, but someday the complexity
could be added to check for inits, side effects, links, etc then determine
if any of those are are alive or affect alive things.

### Named Types

Named type definitions (e.g. `type Foo int`) depend on
Expand Down Expand Up @@ -131,36 +149,142 @@ is `int`. This method in the instance now duck-types to
`interface { getValues() []int }` and therefore must follow the rules for
unexported methods.

## Implementation
Functions and named types may be generic but methods and unnamed types
may not be. This makes somethings simpler. A method with a receiver is used,
only the receiver's instance types are needed. The generic type or function
may not be needed since only the instances are written out.

However, this means that the same method, receiver, type, etc names
will be used with different parameters types caused by different instance
types. When an interface is alive, the signatures for unexported methods
need to be qualified with the parameter types so that we know which instances
the interface is duck-typing to.

## Design

The design is created taking all the parts of the above idea together and
simplifying the justifications down to a simple set of rules.

### Initially alive

- The `main` method in the `main` package
- The `init` in every included file
- Any variable initialization that has a side effect
- Anything not named

### Naming

The following specifies what declarations should be named and how
the name should look. These names are later used to match dependencies
with declarations that should be set as alive.

Some names will have multiple parts; a primary name and secondary name.
This is kind of like a first name and last name when a first name alone isn't
specific enough. This helps with matching multiple dependency requirements
for a declaration, i.e. both parts must be alive before the declaration is
considered alive.

Currently (go1.20), only unexported method declarations will have a secondary
name to support duck-typing with unexported signatures on interfaces.
If the unexported method is depended on then both names will be used.
If the receiver is alive and an alive interface has the unexported signature
then both names will be used to make the unexported method alive.
Since the unexported method is only visible in the package in which it is
defined, the package path is included in the name.

| Declaration | exported | unexported | non-generic | generic instance | primary name | secondary name |
|:------------|:--------:|:----------:|:-----------:|:----------------:|:-------------|:---------------|
| variables | x | x | x | - | `<package path>.<var name>` | - |
| functions | x | x | x | | `<package path>.<func name>` | - |
| functions | x | x | | x | `<package path>.<func name>[<instance type list>]` | - |
| named type | x | x | x | | `<package>.<type name>` | - |
| named type | x | x | | x | `<package>.<type name>[<instance type list>]` | - |
| method | x | | x | | `<package>.<receiver name>` | - |
| method | x | | | x | `<package>.<receiver name>[<instance type list>]` | - |
| method | | x | x | | `<package>.<receiver name>` | `<package path>.<method name>(<parameter type list>)(<result type list>)` |
| method | | x | | x | `<package>.<receiver name>[<instance type list>]` | `<package path>.<method name>(<parameter type list>)(<result type list>)` |

### Dependencies

The dependencies are initialized via two paths.

The first is dependencies that are specified in the expression code.
For example a function that invokes another function will be dependent on
that invoked function. When a dependency is added it will be added as one
or more names to the declaration that depends on it. It follows the
[naming rules](#naming) so that the dependencies will match correctly.

The second is structural dependencies that are specified automatically while
the declaration is being named. When an interface is named, it will
automatically add all unexported signatures as dependencies with,
`<package path>.<method name>(<parameter type list>)(<result type list>)`.

Currently we don't filter unused packages so there is no need to automatically
add dependencies on the packages themselves. This is also why the package
declarations aren't named and therefore are always alive.

The implementation uses string matching to indicate aliveness.
A DCE Info represents a declaration of some code ([`Decl`](../decls.go)).
The DCE info and dependencies use
[`types.Object`](https://pkg.go.dev/go/types#Object)'s to determine the
strings for matching.
## Examples

When adding dependency to the DCE info, the following string is added:
### Dead Package

- The dependency's base string is the path and name (e.g. `foo.Bar`).
- If the object is a method (with a receiver), a tilde is added
(e.g. `foo.Bar~`).
- If the object is an instance of a generic, the instance types (type arguments)
are added (e.g. `foo.Bar[int, string]` or `foo.Bar[int, string]~`).
If any of the instance types are declarations then the path and name
is used with possible sub-instance type
(e.g. `foo.Bar[foo.Foo[map[string]int, bool]]`).
In this example, a point package is defined with a `Point` object.
The point package may be used by several repos as shared code.
For the current example, the `Distance` method is never used and therefore
dead. The `Distance` method is the only method dependent on the math package.
It might be safe to make the whole math package dead too and eliminate in this
case, however, it is possible that some packages aren't used on purpose and
their reason for being included is to invoke the initialization functions
within the package. If a package has any inits or any variable definitions
with side effects, then the package can not be safely removed.

The DCE Info is given an object as the name(s) to use.
```go
package point

- If the object is a method (with a receiver) (e.g. `func (b Bar) baz()`):
- The DCE's name is the receiver's DCE's name,
i.e. the path and receiver name (e.g. `foo.Bar`).
- An additional name is added if the method is unexposed.
The additional name is the path and method name with a tilde
(e.g. `foo.baz~`)
- Else the DCE's name is the path and name (e.g. `foo.Bar`).
import "math"

## Examples
type Point struct {
X float64
Y float64
}

func (p Point) Sub(other Point) Point {
p.X -= other.X
p.Y -= other.Y
return p
}

func (p Point) ToQuadrant1() Point {
if p.X < 0.0 {
p.X = -p.X
}
if p.Y < 0.0 {
p.Y = -p.Y
}
return p
}

func (p Point) Manhattan(other Point) float64 {
a := p.Sub(other).ToQuadrant1()
return a.X + a.Y
}

func (p Point) Distance(other Point) float64 {
d := p.Sub(other)
return math.Sqrt(d.X*d.X + d.Y*d.Y)
}
```

```go
package main

import "point"

func main() {
a := point.Point{X: 10.2, Y: 45.3}
b := point.Point{X: -23.0, Y: 7.7}
println(`Manhatten a to b:`, a.Manhattan(b))
}
```

### Grandmas and Zombies

Expand Down Expand Up @@ -246,3 +370,32 @@ func main() {
fmt.Println(`max count`, max) // Outputs: max count 8
}
```

## Additional Notes

This DCE is different from those found in
Muchnick, Steven S.. “Advanced Compiler Design and Implementation.” (1997),
Chapter 18 Control-Flow and Low-Level Optimization,
Section 10 Dead-Code Elimination. And different from related DCE designs
such as Knoop, Rüthing, and Steffen. "Partial dead code elimination." (1994),
SIGPLAN Not. 29, 6, 147–158.
See [DCE wiki](https://en.wikipedia.org/wiki/Dead-code_elimination)
for more information.

Those discuss DCE at the block code level where the higher level
constructs such as functions and objects have been reduced to a graphs of
blocks with variables, procedures, and routines. Since we want to keep the
higher level constructs during transpilation, we simply are reducing
the higher level constructs not being used.

Any variable internal to the body of a function or method that is unused or
only used for computing new values for itself are left as is.
The Go compiler and linters have requirements that attempt to prevent this
kind of function body dead code (so long as an underscore isn't used to quite
usage warnings) and prevent unreachable code. Therefore, we aren't going to
worry about trying to DCE inside of function bodies or in variable initializers.

GopherJS does not implicitly perform JS Tree Shaking Algorithms,
as discussed in [How Modern Javascript eliminate dead code](https://blog.stackademic.com/how-modern-javascript-eliminates-dead-code-tree-shaking-algorithm-d7861e48df40)
(2023) at this time and provides no guarantees about the effectiveness
of running such an algorithm on the resulting JS.
9 changes: 9 additions & 0 deletions compiler/internal/dce/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ package dce

import (
"errors"
"fmt"
"go/ast"
"go/types"
"strings"

Expand Down Expand Up @@ -48,6 +50,13 @@ func (c *Collector) DeclareDCEDepWithInstance(o types.Object, inst types.Instanc
return // Dependencies are not being collected.
}

ident := &ast.Ident{NamePos: o.Pos(), Name: o.Name()}
if inst, has := c.TypeInfo.Instances[ident]; has {
fmt.Printf(">> Instance found for %v: %v\n", ident, inst)
} else {
fmt.Printf(">> No instance found for %v\n", ident)
}

qualifiedName := o.Pkg().Path() + "." + o.Name()
if inst.TypeArgs != nil {
tps := make([]string, inst.TypeArgs.Len())
Expand Down

0 comments on commit 5494656

Please sign in to comment.