Replies: 4 comments 3 replies
-
I have actually done this exact thing before, many years ago. Before ANTLR. With ANTLR, you can switch modes in the lexer and combine parsers. But I found for this that it was easier to just match the entire EXEC SQL statement and pass it off to a different parser. It is all doable, but needs a bit of care. I'm looking for work if you want to contract it out ;) For the case sensitivity/insensitivity, you can code the lexer tokens for PL/SQL in a separate mode to the C lexer. Or just match the EXEC SQL in the C lexer and call a different lexer parser from that point in the input stream up until the end of the EXEC SQL statement. How to do that depends on what your performance requirements are. If you have taken it as far as a tree, then you probably went too far IMO - you can do this at lexing/parsing and then end up with a tree that represents both languages. |
Beta Was this translation helpful? Give feedback.
-
It's the outdated info becuase |
Beta Was this translation helpful? Give feedback.
-
Yes. It was slightly different because it was a compiler for my own
company’s language; a form of BASIC that generated C. I put the EXEC SQL in
that language so I could parse it and extract target variables etc. then
generate the C equivalent. Same basic idea though - recognize the exec SQL,
then switch lexers and parsers using the same input stream.
For sure, let’s talk. First thing to know is what you are looking to
achieve of course. I see what you mean with the tree now. There is
probably a simpler approach but I would need to see where you’re headed.
It’s a solvable situation though with a little discussion.
https://www.LinkedIn.com/jimidle
Talk later,
Jim
…On Sun, Mar 12, 2023 at 13:49 Raffi Basmajian ***@***.***> wrote:
Hello Jim,
So you've actually done a Pro*C conversion? Interesting, what was the
target language?
I'll reach out to you on LinkedIn, we should talk.
If you have taken it as far as a tree, then you probably went too far IMO
- you can do this at lexing/parsing and then end up with a tree that
represents both languages.
We used parse trees to parse the plsql *grammar*, not pro-c code. We did
that to extract a subset of grammar rules for handling the embedded SQL
commands in pro*c. I don't believe we need the entire plsql grammar for
parsing pro*c, need to do more analysis.
—
Reply to this email directly, view it on GitHub
<#4170 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMDERPEAF6EBDQ2V6DLW3VPX5ANCNFSM6AAAAAAVWWNKVE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hey @raffian @KvanTTT @jimidle @aphyr @tonyarnold please check my bounty at |
Beta Was this translation helpful? Give feedback.
-
Our goal is to create a grammar for parsing source code consisting of ANSI C with embedded PLSQL 11g/12c. This language - called ProC, was developed by Oracle in the 90's.
Embedded SQL in ProC always starts with "
EXEC SQL
", followed by arbitrary plsql block, and finally ending with ';
'. There's a bit more to it than that but for our needs that single rule constitutes 95% of our ProC legacy code.Our theory for achieving this is to extract a subset of rules/tokens from plsql grammar that constitute the "body" portion of plsql procedures and functions - the remaining plsql rules we don't need. Once identified, we merge those rules into the C grammar, thus creating a Pro-C.g4 grammar. Based on our analysis, the starting point in plsql grammar is the rule called
body
https://github.com/antlr/grammars-v4/blob/master/sql/plsql/PlSqlParser.g4#L5457
Our experience with ANTLR is a few weeks - at most, but here's what we've done so far to extract the
body
rule - and its dependencies, from pl/sql grammar:ANTLRv4Parser.g4
grammar. We did this because extracting a subset of rules from plsql grammar requires parsing the grammar itself.visitParserRuleSpec(ANTLRv4Parser.ParserRuleSpecContext ctx)
body
usingif(ctx.getText().startsWith("body:"))
body:
node, we grab the raw text for the rule and extract all constituent rules/tokens from its alternatives. We we don't understand ANTLR sufficiently enough to use it for this step with perhapsgetChild()/Nodes
, etc, so instead we use regex for extracting the rules/tokens manually, then push them onto a stack.Result
Of the 927 total rules and 2325 tokens in PL/SQL grammar, the body rule is composed of 285 rules - and nearly all the tokens.
The next steps will be to parse plsql blocks from our legacy code using this partial grammar but without any C grammar rules. Assuming those tests are successful, the last step will be to merge all extracted plsql rules and tokens into the C grammar though this step must be done carefully to avoid name conflicts. An easy solution we're exploring for that is prefixing all plsql rules and tokens with a qualifier, something like this:
So that's it - that's our plan. Is this approach worthwhile, or is it plagued with landmines and pitfalls not worth pursuing?
Update:
We're concerned about this note in the PL/SQL grammar readme. If case insensitivity matters to PlSql grammar but not for the C grammar; what implication will this have when we merge the two grammars?
https://github.com/antlr/grammars-v4/tree/master/sql/plsql#readme
Thanks for listening,
Raffi
Beta Was this translation helpful? Give feedback.
All reactions