Removed conversion manager.

Main function now: - creates the parser, and calls the parse method passing the input reader as an argument, - gets the parsed text, and - sends teh parsed text to the output writers. Changed parser to receive an input reader and parse the whole stream. Changed tokenizer to receive an input reader and generate tokens for the whole stream. Removed source text and source location from the tokenizer, as this information could be obtained from the AST. Updated grammar to include a list of sentences at the top level. Updated README.md.
rturrado · Jan 17, 2023 · ce4eddf · ce4eddf
1 parent 48a28ec
commit ce4eddf
Show file tree

Hide file tree

Showing 12 changed files with 434 additions and 395 deletions.
diff --git a/README.md b/README.md
@@ -239,10 +239,11 @@ The `main` function logic is quite simple:
 - Parses the command line options.
 - Creates an input reader.
 - Creates a stream output writer (that will write to standard output), and, if requested by the user, a file output writer.
-- Runs the conversion manager, passing it the reader and the writers.
+- Creates a parser, passing the input reader as an argument, and calls its parse method to receive the parsed text.
+- Sends the parsed text to the output writers.
 
-Exceptions thrown whether during the parsing of the command line options, or while creating the reader or the writers, are captured,
-and make the program terminate.<br/>
+Exceptions thrown whether during the parsing of the command line options, while creating the reader or the writers, or by the parser,
+are captured, and make the program terminate.<br/>
 Both readers and writers are implemented as runtime polymorphic objects. A pure virtual base class, e.g. `input_reader` defines an interface,
 and concrete classes, e.g. `file_reader`, implement that interface.
 Using polymorphic readers is not mandatory for the task, but makes the implementation symmetric to that of the writers.
@@ -276,44 +277,27 @@ Again, each concrete class holds a stream, in this case an output stream.<br/>
 The `file_writer` constructor just checks that the file stream is good. It doesn't check the file already exists.
 The base class just exposes one `write` method, which grabs the output stream and writes a text to it.
 
-#### Conversion manager
-
-The `conversion_manager`:
-- reads an input text from an `input_reader`,
-- processes it using a `parser`, and
-- writes it out to a list of `output_writer`s.
-
-It basically contains a static `run` function that:
-- Keeps reading sentences from an `input_reader` until the end of the file is reached.
-- Texts that do not form a sentence (i.e. that do not end in a period) are not converted. All the texts are written out though.
-- Every input sentence that needs to be processed is sent to the `parser`, and the result of this parsing appended to an output sentence.
-- Once an input sentence has been processed, the output sentence is sent out to the different writers. 
-
 #### Tokenizer
 
-The `tokenizer` receives a text upon construction, and regex searches it for different patterns (space, dash, period, or word).
-This search is done at `operator()`, a coroutine that yields the found tokens back to the caller.
+The `tokenizer` receives an `input_reader` upon construction, and keeps reading sentences from it until the end of the stream is reached.
+Every sentence is regex searched for different patterns (space, dash, period, or word).
+The reading of sentences is done at `operator()`, and the regex searches at `get_next_token()`.
+Both methods form a nested coroutine that yields the found tokens back to the caller.
 Notice that text not fitting any of the patterns will still be captured, whether as a prefix of the search operation,
 or as a remainder of the search loop, and yielded as a token of type `other`.
-Once the input text has been completely processed, an `end` token is yielded.
-
-For debugging purposes, the `tokenizer` also keeps track of a *source location*,
-an offset to the start of the returned token within the input text. 
+Once the stream has been completely processed, an `end` token is yielded.
 
 #### Lexer
 
 The `lexer` hides the `tokenizer` implementation to the `parser`, and offers:
-- two main methods: `advance_to_next_token` and `get_current_token`,
-- two helper methods: `get_current_lexeme` and `get_current_text` to access the two members of a token, and
-- two methods for debugging purposes: `get_source_text` and `get_source_location`.
+- two main methods: `advance_to_next_token` and `get_current_token`, and
+- two helper methods: `get_current_lexeme` and `get_current_text` to access the two members of a token.
 
 #### Parser
 
-The `parser`:
-- is constructed from an input text corresponding to a sentence, i.e., a text ending in a period character;
-- creates  a `lexer`, passing it this input text, and an `AST` (Abstract Syntax Tree);
-- calls a `start` method, where all the parsing is effectively done, and
-- returns an output text via the `AST`. 
+The `parser` is constructed from an `input_reader`, and creates  a `lexer`, passing it this input reader,
+and an `AST` (Abstract Syntax Tree). The `parse` method calls a `start` method, where all the parsing is effectively done, and
+returns an output text via the `AST`. 
 
 The `start` method is the entry point to a descendent parser implementation, based on an LL1 grammar.
 Typical descendent parser implementations define a function for each element of the grammar.
@@ -326,15 +310,13 @@ Each of these functions can:
 
 #### Abstract Syntax Tree
 
-The `AST` is implemented as a vector of 2 types of nodes:
-- text nodes, and
-- number expression nodes.
-
-Number expression nodes, likewise, are implemented as a vector of 2 types of nodes:
-- integer nodes, and
-- text nodes.
+The `AST` is implemented as a vector of sentence nodes.
+Likewise, a sentence node is implemented as a vector of two types of nodes: text nodes, or number expression nodes.
+And number expression nodes, again, are lists of possibly two types of nodes: text nodes, or integer nodes.
 
-The `AST` composes the output text for the `parser` by:
+The `AST` offers two APIs: `dump()` and `evaluate()`. The only difference between these two methods is at the number expressions level.
+Dumping a number expression returns the original input text for that expression.
+While evaluating a number expression performs the conversion from words to numbers.  The `AST` performs this evaluation by:
 - walking the vector of nodes,
 - concatenating the text nodes, and
 - for the case of a number expression, concatenating the value of the expression. 

diff --git a/include/word_converter/ast.h b/include/word_converter/ast.h
@@ -5,17 +5,54 @@
 #include <numeric>  // accumulate
 #include <stdexcept>  // runtime_error
 #include <string>  // to_string
+#include <unordered_map>
 #include <variant>  // visit
 #include <vector>
 
 
+inline static const std::unordered_map<int, std::string> number_to_word_map{
+    { 0, "zero" },  // zero
+    { 1, "one" },  // one
+    { 2, "two" },  // two to nine
+    { 3, "three" },
+    { 4, "four" },
+    { 5, "five" },
+    { 6, "six" },
+    { 7, "seven" },
+    { 8, "eight" },
+    { 9, "nine" },
+    { 10, "ten" },  // ten to nineteen
+    { 11, "eleven" },
+    { 12, "twelve" },
+    { 13, "thirteen" },
+    { 14, "fourteen" },
+    { 15, "fifteen" },
+    { 16, "sixteen" },
+    { 17, "seventeen" },
+    { 18, "eighteen" },
+    { 19, "nineteen" },
+    { 20, "twenty" },  // tens
+    { 30, "thirty" },
+    { 40, "forty" },
+    { 50, "fifty" },
+    { 60, "sixty" },
+    { 70, "seventy" },
+    { 80, "eighty" },
+    { 90, "ninety" },
+    { 100, "hundred" },  // a hundred
+    { 1'000, "thousand" },  // a thousand
+    { 1'000'000, "million" },  // a million
+    { 1'000'000'000, "billion" }  // a billion
+};
+
+
 struct invalid_number_expression_error : public std::runtime_error {
-    explicit invalid_number_expression_error(const std::string& message) : std::runtime_error{ "" } {
-        message_ += fmt::format("'{}'", message);
+    explicit invalid_number_expression_error(const std::string& number_expression_str) : std::runtime_error{ "" } {
+        message_ += fmt::format("'{}'", number_expression_str);
     }
     [[nodiscard]] const char* what() const noexcept override { return message_.c_str(); };
 private:
-    std::string message_{ "invalid number expression error: " };
+    std::string message_{ "invalid number expression: " };
 };
 
 
@@ -51,19 +88,21 @@ namespace ast {
 struct text_node {
     std::string data{};
     explicit text_node(std::string text) : data{ std::move(text) } {}
-    [[nodiscard]] std::string to_string() const { return data; }
+    [[nodiscard]] std::string dump() const { return data; }
+    [[nodiscard]] std::string evaluate() const { return data; }
 };
 
 
 struct int_node {
     int data{};
     explicit int_node(int value) : data{ value } {}
-    [[nodiscard]] std::string to_string() const { return std::to_string(data); }
+    [[nodiscard]] std::string dump() const { return number_to_word_map.at(data); }
+    [[nodiscard]] std::string evaluate() const { return std::to_string(data); }
 };
 
 
 class number_expression_node {
-    using node_t = std::variant<int_node, text_node>;
+    using node_t = std::variant<text_node, int_node>;
     using nodes_t = std::vector<node_t>;
 private:
     nodes_t nodes_{};
@@ -80,7 +119,14 @@ class number_expression_node {
         });
         return numbers_stack.value();
     }
-    [[nodiscard]] std::string to_string() const {
+    [[nodiscard]] std::string dump() const {
+        std::string ret{};
+        std::ranges::for_each(nodes_, [&ret](auto&& node) {
+            std::visit([&ret](auto&& arg) { ret += arg.dump(); }, node);
+        });
+        return ret;
+    }
+    [[nodiscard]] std::string evaluate() const {
         if (nodes_.empty()) {
             return {};
         }
@@ -103,24 +149,41 @@ class sentence_node {
     void add(node_t n) {
         nodes_.push_back(std::move(n));
     }
-    [[nodiscard]] std::string to_string() const {
+    [[nodiscard]] std::string dump() const {
         std::string ret{};
         std::ranges::for_each(nodes_, [&ret](auto&& node) {
-            std::visit([&ret](auto&& arg) { ret += arg.to_string(); }, node);
+            std::visit([&ret](auto&& arg) { ret += arg.dump(); }, node);
+        });
+        return ret;
+    }
+    [[nodiscard]] std::string evaluate() const {
+        std::string ret{};
+        std::ranges::for_each(nodes_, [&ret](auto&& node) {
+            std::visit([&ret](auto&& arg) { ret += arg.evaluate(); }, node);
         });
         return ret;
     }
 };
 
 
 class tree {
-    sentence_node start_;
+    using node_t = sentence_node;
+    using nodes_t = std::vector<node_t>;
+private:
+    nodes_t nodes_{};
 public:
-    void add(sentence_node n) {
-        start_ = std::move(n);
+    void add(node_t n) {
+        nodes_.push_back(std::move(n));
     }
-    [[nodiscard]] std::string to_string() const {
-        return start_.to_string();
+    [[nodiscard]] std::string dump() const {
+        return std::accumulate(nodes_.begin(), nodes_.end(), std::string{}, [](const auto& total, const auto& node) {
+            return total + node.dump();
+        });
+    }
+    [[nodiscard]] std::string evaluate() const {
+        return std::accumulate(nodes_.begin(), nodes_.end(), std::string{}, [](const auto& total, const auto& node) {
+            return total + node.evaluate();
+        });
     }
 };
 

diff --git a/include/word_converter/conversion_manager.h b/include/word_converter/conversion_manager.h
diff --git a/include/word_converter/grammar.ebnf b/include/word_converter/grammar.ebnf
@@ -1,11 +1,17 @@
-start                           ::= sentence
+start                           ::= sentences
+
+sentences                       ::= sentence rest_of_sentences
+rest_of_sentences               ::= sentences
+                                  | nothing
 
 sentence                        ::= sentence_prefix sentence_body
 sentence_prefix                 ::= text_without_number_expressions
 sentence_body                   ::= number_expression rest_of_sentence_body
                                   | period
+                                  | end
 rest_of_sentence_body           ::= text_without_number_expression sentence_body
                                   | period
+                                  | end
 
 text_without_number_expressions ::= text_without_number_expression text_without_number_expressions
                                   | nothing

diff --git a/include/word_converter/input_reader.h b/include/word_converter/input_reader.h
@@ -72,3 +72,6 @@ class stream_reader : public input_reader {
         return is_;
     }
 };
+
+
+using input_reader_up = std::unique_ptr<input_reader>;