improve λ-match parse errors

HigherOrderCO · Sep 29, 2024 · 65cbdaf · 65cbdaf
1 parent bb25ba5
commit 65cbdaf
Show file tree

Hide file tree

Showing 2 changed files with 175 additions and 7 deletions.
diff --git a/book/main.kind b/book/main.kind
@@ -1,3 +1,7 @@
 main
-: U32
-= 3
+: ∀(n: Nat)
+  Nat
+= λ{
+  #s: λpred #z{}
+  #z: #z{}
+}
diff --git a/src/Kind/Parse.hs b/src/Kind/Parse.hs
@@ -83,7 +83,7 @@ withSrc parser = do
 
 parseTrivia :: Parser ()
 parseTrivia = P.skipMany (parseSpace <|> parseComment) where
-  parseSpace = do
+  parseSpace = P.try $ do
     P.space
     return ()
   parseComment = P.try $ do
@@ -233,10 +233,11 @@ parseSwi = withSrc $ do
 
 parseMat = withSrc $ do
   P.try $ P.string "λ{"
-  cse <- P.many $ P.try $ do
-    parseTrivia
-    P.char '#'
-    cnam <- parseName
+  cse <- P.many $ do
+    cnam <- P.try $ do
+      parseTrivia
+      P.char '#'
+      parseName
     parseTrivia
     P.char ':'
     cbod <- parseTerm
@@ -371,3 +372,166 @@ expandUses uses name =
   case filter (\(short, _) -> short `isPrefixOf` name) uses of
     (short, long):_ -> long ++ drop (length short) name
     []              -> name
+
+
+
+
+
+
+-- PROBLEM: when parsing a file like:
+
+-- main
+-- : ∀(n: Nat)
+  -- Nat
+-- = λ{
+  -- #s: λpred #z
+  -- #z: #z{}
+-- }
+
+-- we get the following error:
+
+-- PARSE_ERROR
+-- - expected: space, "//", "}"
+-- - detected:
+-- 5 |   #s: λpred #z
+
+-- /Users/v/vic/dev/kind2hs/book/main.kind 5:3
+
+-- which is blaming the first # character of the '#s: λpred #z' line. the problem is, the error is actually that we're missing a {} after #z on the end of the line.
+
+-- seems like this happens because, for some reason, the parser backtracks all the way to #s: ... once the 'λpred #z' part fails to parse. but that should NEVER happen.
+
+-- instead, the desired behavior is that, when an "uniquely defining" part of a parser parses, then we never backtrack. so, for example, when we parse:
+
+-- #s:
+
+-- inside the λ-match, we should parse this case's body (another term), and never backtrack further.
+
+-- similarly, when we parse the header of a lambda (λpred), we should parse this lambda's body (another term), and never backtrack.
+
+-- similarly, when we parse '#name', we know we're in a constructor, and we should never backtrack.
+
+-- in other words, backtracking should be used only to disambiguate parses that could be ambiguous.
+
+-- for example, parseSwi and parseMat share the same beginning:
+
+-- λ{
+
+-- but parseSwi parses a '0:', while parseMat parses a '#name:'
+
+-- I'm not familiar with parsec, so, I wrote the parser above, which works fine. but this backtracking behavior is going too far.
+
+-- how can I improve the parser to result in the desired behavior?
+
+-- remember: all parses should start by "checking" if we start with the proper header-part (for example, on swi, the header is 'λ{' then '0:'). and, once the "header" is parsed, then, all parses after that are mandatory. the backtracking occurs only on the "header-part" of each parser. and this applies to ALL parsers.
+
+
+-- To achieve the desired behavior of limiting backtracking to only the "header" part of each parser, you can use the `P.try` combinator selectively. Here's how you can modify the parsers:
+
+-- 1. Modify the `parseTerm` function to use `P.try` only for the initial part of each parser:
+
+-- ```haskell
+-- parseTerm = do
+  -- parseTrivia
+  -- P.choice
+    -- [ P.try parseAll
+    -- , P.try parseSwi
+    -- , P.try parseMat
+    -- , P.try parseLam
+    -- , P.try parseOp2
+    -- , P.try parseApp
+    -- , P.try parseAnn
+    -- , P.try parseSlf
+    -- , P.try parseIns
+    -- , P.try parseDat
+    -- , P.try parseCon
+    -- , P.try parseUse
+    -- , P.try parseLet
+    -- , P.try parseSet
+    -- , P.try parseNum
+    -- , P.try parseTxt
+    -- , P.try parseHol
+    -- , P.try parseMet
+    -- , parseRef  -- No need for P.try here as it's the last choice
+    -- ]
+-- ```
+
+-- 2. Modify individual parsers to use `P.try` only for the header part. For example:
+
+-- ```haskell
+-- parseSwi = withSrc $ do
+  -- P.try $ do
+    -- P.string "λ{"
+    -- parseTrivia
+    -- P.string "0:"
+  -- zero <- parseTerm
+  -- parseTrivia
+  -- P.string "_:"
+  -- succ <- parseTerm
+  -- parseTrivia
+  -- P.char '}'
+  -- return $ Swi zero succ
+
+-- parseMat = withSrc $ do
+  -- P.try $ P.string "λ{"
+  -- cse <- P.many $ P.try $ do
+    -- parseTrivia
+    -- P.char '#'
+    -- cnam <- parseName
+    -- parseTrivia
+    -- P.char ':'
+    -- cbod <- parseTerm
+    -- return (cnam, cbod)
+  -- parseTrivia
+  -- P.char '}'
+  -- return $ Mat cse
+
+-- parseCon = withSrc $ do
+  -- P.try $ P.char '#'
+  -- nam <- parseName
+  -- parseTrivia
+  -- P.char '{'
+  -- arg <- P.many $ P.try $ parseTerm
+  -- parseTrivia
+  -- P.char '}'
+  -- return $ Con nam arg
+-- ```
+
+-- 3. For parsers that don't need backtracking at all, remove `P.try` completely:
+
+-- ```haskell
+-- parseRef = withSrc $ do
+  -- name <- parseName
+  -- (_, uses) <- P.getState
+  -- let name' = expandUses uses name
+  -- return $ case name' of
+    -- "U32" -> U32
+    -- _     -> Ref name'
+-- ```
+
+-- These changes will ensure that backtracking only occurs at the beginning of each parser, allowing for better error reporting and preventing excessive backtracking.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+