Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile typechecking performance #1031

Merged
merged 18 commits into from
Feb 22, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ jobs:
${{ runner.os }}-check-${{ github.ref }}-${{ matrix.agda }}-
${{ runner.os }}-check-refs/heads/master-${{ matrix.agda }}-
- name: Typecheck the whole formalization
- name: Typecheck library
run: |
cd master
make check
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/clean-up.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: cleanup caches generated by pull requests
name: Clean up caches generated by pull requests
on:
pull_request:
types:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pages.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Formalization website
name: Build and deploy library website
on:
# To run this workflow manually
workflow_dispatch:
Expand Down
79 changes: 79 additions & 0 deletions .github/workflows/profiling.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: Profile library typechecking
on:
push:
branches:
- master

# Don't cancel previous runs of the same branch
concurrency:
group: '${{ github.workflow }}-${{ github.head_ref || github.run_id }}'
fredrik-bakke marked this conversation as resolved.
Show resolved Hide resolved
cancel-in-progress: false

jobs:
typecheck-performance:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
agda: ['2.6.4']
steps:
- name: Checkout our repository
uses: actions/checkout@v3
with:
path: master

- name: Setup Agda
uses: wenkokke/[email protected]
with:
agda-version: ${{ matrix.agda }}

- name: Typecheck library with profiling
run: |
cd master
make check-profile 2> temp/memory-results.txt | tee temp/benchmark-results.txt

- name: Download previous typechecking profile
uses: actions/cache/restore@v4
with:
path: './benchmark-cache'
key: ${{ matrix.os }}-benchmark-${{ github.event.before }}

- name: Process new profiling data for typechecking
fredrik-bakke marked this conversation as resolved.
Show resolved Hide resolved
run: |
cd master
python3 scripts/typechecking_profile_parser.py \
temp/benchmark-results.txt temp/memory-results.txt \
temp/benchmark-results.json benchmark-cache/data.csv \
${{ github.sha }}

- name: Merge JSON profiling data for typechecking
fredrik-bakke marked this conversation as resolved.
Show resolved Hide resolved
uses: rhysd/github-action-benchmark@v1
with:
tool: 'customSmallerIsBetter'
# Location of the new data
output-file-path: './master/temp/benchmark-results.json'
# Location of the aggregate data
external-data-json-path: './benchmark-cache/data.json'

- name: Save merged benchmark data
fredrik-bakke marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/cache/save@v4
with:
path: './benchmark-cache'
key: ${{ matrix.os }}-benchmark-${{ github.event.after }}

- name: Publish the CSV as an artifact
fredrik-bakke marked this conversation as resolved.
Show resolved Hide resolved
uses: actions/upload-artifact@v4
with:
name: 'Library profiling history'
path: './benchmark-cache/data.csv'

- name: Prepare new revision of the profiling website
run: |
cd master
mkdir benchmark-website
cp website/benchmarks/index.html benchmark-website/
echo 'window.BENCHMARK_DATA =' | cat - ../benchmark-cache/data.json >benchmark-website/data.js

- name: Deploy the new profiling website
run: |
npx netlify-cli deploy --dir=benchmark-website
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,7 @@ package-lock.json
docs/
html/
book/
temp/
src/temp/
src/everything.lagda.md
SUMMARY.md
Expand Down
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
everythingOpts := --guardedness --cohesion --flat-split
# use "$ export AGDAVERBOSE=-v20" if you want to see all
AGDAVERBOSE ?= -v1
AGDARTS := +RTS -M4.0G -RTS
AGDARTS := +RTS -M6.0G -RTS
AGDAFILES := $(shell find src -name temp -prune -o -type f \( -name "*.lagda.md" -not -name "everything.lagda.md" \) -print)
CONTRIBUTORS_FILE := CONTRIBUTORS.toml

Expand All @@ -20,6 +20,7 @@ CONTRIBUTORS_FILE := CONTRIBUTORS.toml
# at is at least in a proper code block with syntax highlighting, albeit without
# the agda-unimath chrome.
AGDAHTMLFLAGS ?= --html --html-highlight=auto --html-dir=docs --css=website/css/Agda.css --only-scope-checking
AGDAPROFILEFLAGS ?= --profile=modules +RTS -s -RTS
AGDA ?= agda $(AGDAVERBOSE) $(AGDARTS)
TIME ?= time

Expand Down Expand Up @@ -70,6 +71,10 @@ src/everything.lagda.md: agdaFiles
check: ./src/everything.lagda.md
${TIME} ${AGDA} $?

.PHONY: check-profile
check-profile: ./src/everything.lagda.md
${AGDA} ${AGDAPROFILEFLAGS} $?

agda-html: ./src/everything.lagda.md
@rm -rf ./docs/
@mkdir -p ./docs/
Expand Down
123 changes: 123 additions & 0 deletions scripts/typechecking_profile_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
import json
import re
import os
import argparse
import csv


def parse_memory_profiling_data(filepath):
results = dict()

# Define patterns to match each line and their corresponding unit
patterns = {
"memory_allocated_in_heap": (r"(\d+(?:,\d+)*) bytes allocated in the heap", "B"),
"memory_copied_during_GC": (r"(\d+(?:,\d+)*) bytes copied during GC", "B"),
"maximum_residency": (r"(\d+(?:,\d+)*) bytes maximum residency", "B"),
"memory_maximum_slop": (r"(\d+(?:,\d+)*) bytes maximum slop", "B"),
"total_memory_in_use": (r"(\d+) MiB total memory in use", "MiB")
}

with open(filepath, 'r') as file:
for line in file:
for key, (pattern, unit) in patterns.items():
match = re.search(pattern, line)
if match:
value = int(match.group(1).replace(",", ""))
if key == "memory_maximum_slop": # Convert maximum slo to KiB and truncate
value //= 1024
unit = "KiB"
elif unit == "B": # Convert bytes to MiB and truncate
value //= 1024 * 1024
unit = "MiB"
results[key] = {"value": value, "unit": unit}

return results

def parse_benchmark_results(input_path):
benchmarks = dict()
with open(input_path, 'r') as file:
for line in file:
# Match lines that end with "ms" indicating a timing result
match = re.fullmatch(r'^\s*(\S+)\s+(\d+)(,\d+)*ms\s*$', line)
if match:
name = match.group(1).strip()
# Correctly parse and combine the number groups to handle commas in numbers
milliseconds = int("".join([x.replace(',', '') for x in match.groups()[1:] if x]))
benchmarks[name] = {'value': milliseconds, 'unit':'ms'}
# benchmarks.append({'name': name, 'value': milliseconds, 'unit':'ms'})
return benchmarks

subdict =\
lambda original_dict, keys_to_extract:\
original_dict if keys_to_extract is None else {key: original_dict[key] for key in keys_to_extract if key in original_dict}


convert_dict_to_list =\
lambda data, keys_to_extract=None: [{'name': name, **details} for name, details in subdict(data, keys_to_extract).items()]

def save_github_action_benchmark_json(output_path, benchmarks, memory_stats, benchmark_keys, memory_keys):
with open(output_path, 'w') as file:
json.dump(convert_dict_to_list(benchmarks, benchmark_keys) + convert_dict_to_list(memory_stats, memory_keys) , file, indent=2)


def read_existing_csv_to_dict(csv_path, commit_hash):
# Initialize a dictionary to hold the CSV data
data_dict = {}
fieldnames = ['name', 'unit', commit_hash]

# Check if the file exists and read its content
if os.path.exists(csv_path):
with open(csv_path, mode='r', newline='') as csvfile:
reader = csv.DictReader(csvfile)
# Update fieldnames with those found in the existing CSV, plus the new commit hash if necessary
fieldnames = reader.fieldnames + [commit_hash] if commit_hash not in reader.fieldnames else reader.fieldnames
for row in reader:
data_dict[row['name']] = row

return data_dict, fieldnames

def update_csv_data(data_dict, benchmarks, memory_stats, commit_hash):
# Combine benchmarks and memory stats for easier processing
combined_data = {**memory_stats, **benchmarks}

# Update the data_dict with new or updated values
for name, details in combined_data.items():
if name not in data_dict:
data_dict[name] = {'name': name, 'unit': details['unit']}
data_dict[name][commit_hash] = details['value']

def write_csv_from_dict(csv_path, data_dict, fieldnames, commit_hash):
# Custom sort function: Sort by unit first, then capitalized names first, then alphabetical order
custom_sort = lambda item: (item['unit'] == "ms", item['unit'] != "ms" or item['name'][0].islower(), 0 if item['unit'] != "ms" or commit_hash not in item.keys() else -item[commit_hash])

with open(csv_path, mode='w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
# Sort the data based on the custom sort function before writing
sorted_data = sorted(data_dict.values(), key=custom_sort)
for row in sorted_data:
writer.writerow(row)

def save_as_csv(benchmarks, memory_stats, csv_path, commit_hash):
data_dict, fieldnames = read_existing_csv_to_dict(csv_path, commit_hash)
update_csv_data(data_dict, benchmarks, memory_stats, commit_hash)
write_csv_from_dict(csv_path, data_dict, fieldnames, commit_hash)

if __name__ == "__main__":
# Set up argument parsing
parser = argparse.ArgumentParser(description='Convert benchmark results to JSON format.')
parser.add_argument('input_times_path', help='Path to the input file containing typechecking times.')
parser.add_argument('input_memory_path', help='Path to the input file containing memory statistics.')
parser.add_argument('output_json_path', help='Path to the output JSON file.')
parser.add_argument('csv_path', help='Path to the profiling CSV file.')
parser.add_argument('commit_hash', help='Commit hash for current commit.')

# Parse arguments from command line
args = parser.parse_args()

# Use the provided command-line arguments
benchmarks = parse_benchmark_results(args.input_times_path)
memory_stats = parse_memory_profiling_data(args.input_memory_path)

save_github_action_benchmark_json(args.output_json_path, benchmarks, memory_stats, ["Total",], ["total_memory_in_use",])
save_as_csv(benchmarks, memory_stats, args.csv_path, args.commit_hash)
Loading
Loading