Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for .org files #14

Open
luxbock opened this issue Jan 19, 2014 · 10 comments
Open

Add support for .org files #14

luxbock opened this issue Jan 19, 2014 · 10 comments

Comments

@luxbock
Copy link

luxbock commented Jan 19, 2014

This could probably be done much in the same way that nakkaya's static does it: by calling Emacs via clojure.java.shell to batch export org to html. Using it like this would make it quite different from the Markdown implementation as we now lack a function that translates the contents of the file to a html-string.

One approach would be to define a org-to-html function that takes an org-str, writes it to a temp file, has Emacs export the temp file into html, which is then read and returned, after which we clean up by deleting the temp org and html-files. This approach seems a bit hacky but then the rest of the implementation would look exactly like that of Markdown.

The other approach would be to just register a custom org-specific function that does all the appropriate things that the html-parser would have done otherwise. I was looking at the source code for a while to try and figure out but I was confused by this:

(defn File->Parse
  "Read in a file and turn it into a Parse instance."
  [content-fn ^File file]
  (let [file-str (slurp file)
        parse-meta (merge (conf/get :parse-defaults)
                          (edn/read-string file-str))
        content (content-fn (second (s/split file-str #"\}" 2)))]
    (meta-and-content->Parse parse-meta content)))

Is this the function that reads the (Markup) file from disk? I thought so, but it appears that what is being read is actually a Clojure map. I must be missing something.

How should one go about implementing this?

@RyanMcG
Copy link
Owner

RyanMcG commented Jan 19, 2014

Using it like this would make it quite different from the Markdown implementation as we now lack a function that translates the contents of the file to a html-string.

Very true. The markdown implementation uses the html-parser higher order function which makes generating a fully incise friendly parser a breeze. Using html-parser is not mandatory though (even for parsers which generate html files). The restrictions on parsers in incise are few.

  1. A parser is a function.
  2. It takes in a single file as its only argument. This file is dispatched to the parser by incise based on file extensions and by it being in the input directory.
  3. It returns a delay or thunk, which when invoked should return a sequence of files. This is for two-step parsing. Look at the README for the purpose of two step parsing. Basically it is to support tags and other things which require some idea of all files being parsed.
  4. Though not mandatory, that sequence of files should be the files generated by the parser and they should be written into the output directory (conf/get :out-dir), somewhere.

I tried to hit on this in the README, but it is, unfortunately a rather confusing document that I should probably spend some more time revising.

The other approach would be to just register a custom org-specific function that does all the appropriate things that the html-parser would have done otherwise.

Yep, that's probably what I would suggest here.

(def org-parser [org-file]
  (incise.parsers.parse/record-parse (read-meta-from-file org-file))
  (delay [(shell-out-to-emacs org-file)]))

(pc/register :org org-parser)

This is sort of pseudo code in that read-meta-from-file and shell-out-to-emacs are just made up functions. The parser author will have to implement them or something like them. Recording the parse (which really should be recording parse meta...I'll have to change that) is optional. It is necessary if you want tags or categories so that other parsers know about the files generated by your parser.

Parsers generated with html-parser make a few assumptions, one of which is that the source file has an edn map at the beginning of it (you may have noticed that all example source files start off with this). File->Parse Is where that map is read in and the file is split at the closing }. The remainder of the file is assumed to be content for the parser itself (the input string to the function passed in to html-parser). It's easily the sketchiest function in incise because of the naive way it splits the file.

I guess that takeaway here is that to get nice things like tags, the parser has to be able to get some meta information about the source file like, tags, title, category, date, whatever. I'm not sure what the best way to do that is for org mode files. With out a meta map, you can not record it for other parsers to read and you may not be able to use helper functions like incise.parsers.utils/meta->write-path.

I suspect you'll have some clarifying questions. Ha.

@luxbock
Copy link
Author

luxbock commented Jan 20, 2014

Ah of course I should have checked out the example content first. Luckily for org, I think all of the information in the edn map can be represented in the org format itself, which means that I should also write functionality for parsing that information and then storing it in the meta map. What other content can the meta map include besides:

  • :layout (:post, :page)
  • :title
  • :date
  • :category
  • :tags, ?

Another cool thing about org is that it can be just as easily exported into a variety of formats (pdf, tex, ascii, utf-8, odf, freemind), so I definitely agree that the org-parser should not just piggyback off the html-parser like the Markdown implementation does. I in addition to having a default export format defined in the config file, I could add support for a header tag such as #+EXPORT_FORMAT: pdf at the beginning of a file, which would then override the default export format.

For the config file, should everything specific to org go inside its own map, like this?

{:site-title "incise"
 :author "author"
 ...
 :org {:emacs-cmd "emacs"
       :eval-elisp ["(setq org-export-headline-levels 2)"]
       :default-export-format :html}}

To me this looks a bit cleaner.

@RyanMcG
Copy link
Owner

RyanMcG commented Jan 20, 2014

Luckily for org, I think all of the information in the edn map can be represented in the org format itself, which means that I should also write functionality for parsing that information and then storing it in the meta map.

Yes! Precisely!

What other content can the meta map include besides ...

I suggest you look at incse.parsers.parse/Parse. Those are all of the keys defined in the parse record. There is no reason you could store more though. If removing type hints is necessary to do so, then go ahead. I've been thinking about getting rid of Parse entirely. I am worried it is more confusing than helpful.

To me this looks a bit cleaner.

Agreed, it sound like that'd be pretty nice.

My made up convention for parser specific configuration is to do something like this:

{:parsers {:org {:emacs-cmd "emacs"
                 :eval-elisp ["(setq org-export-headline-levels 2)"]
                 :default-export-format :html}}}

You can have a default config for your parser specified in your parser's namespace and merge this in from the user's config (something like: (incise.config/get-in [:parsers :org])). You can even allow the user to override this with a specific file like you suggest. It is up to the parser implementation.

Also, one other thing that html-parser does that I forgot to mention before is it uses layouts. If you check out the README you'll see layouts are just an easy way to do shared String -> String functions (like html and body tags around the content of a post). If you do not use a string as an intermediary format you can just ignore them though.

@RyanMcG
Copy link
Owner

RyanMcG commented Feb 3, 2014

Hey @luxbock, do you need any more help with this? I'd be glad to clarify anything you need me to.

@luxbock
Copy link
Author

luxbock commented Feb 3, 2014

Sorry I recently switched from Windows to OSX and got distracted in the process.

I see you have reorganized the code into smaller repositories. I'm guessing my work with the org implementation should go to a new project called incise-org?

I'll get back to working on this and should have something out soon.

@luxbock
Copy link
Author

luxbock commented Feb 3, 2014

One question: The way I'm implementing this now is to follow the steps of the html-parser and turn the org-file to a Parse. However I'm confused about what to do with the :extension property. It appears that it's needed for the path to be constructed correctly, and html-parser always sets it as /index.html?

@RyanMcG
Copy link
Owner

RyanMcG commented Feb 3, 2014

I see you have reorganized the code into smaller repositories. I'm guessing my work with the org implementation should go to a new project called incise-org?

Yes, I did restructure a bit. This was planned for 0.2.0. It does not change what you would be doing. Even 0.1.0 would have had you create a separate project (your.domain.org/incise-org-parser) to be consistent. It would not have been included in the incise project by default unless there was high demand. incise-core has no extensions and is just the framework. incise is just a dependency closure around incise-core and some select extensions.

what to do with the :extension property. It appears that it's needed for the path to be constructed correctly, and html-parser always sets it as /index.html?

It can be overridden by the parse (in the parse-meta). The default is /index.html so a page with the title bio is generated as: /bio/index.html. The extension of .html would have resulted in /bio.html. Of course, if this bio page had a :path set as 'donald_duck' the file would be written as: /donald_duck without an extension. In other words, It is just a string which is used when auto-generating a :path. It is not required if you generate the path differently. In fact it is a value that is only really understood by html-parser.

@luxbock
Copy link
Author

luxbock commented Feb 6, 2014

I took some time to read up on the best practices of publishing org-files, and it appears that a much more robust system can be achieved by allowing the user configure their org-publish functionality in a separate incise.el file. The export/publishing functionality in org-mode is very powerful as it is, so exposing the user (with sane defaults of course) seems like the way to go.

The export-functionality in org-mode works by having the user define "components" or "projects" that take variables such as the base-directory, the output-directory, the export format and a multitude of other options. These can then be combined together to form other projects/components. So the way I have it planned now is you have a higher-order function that takes the name of a org-publish component and creates and org-parser from that.

The easiest way to publish files from Emacs is to use the org-publish-project function, which publishes whatever files are included in the definition of that project. There's also a function for publishing individual files, but I can't figure out how to tell org-mode what project to use for it, which means I would not be taking full advantage of what org has to offer. Using org-publish-project exports any new or modified files, so this means that an org-parser that uses this function can potentially parse many files at once. I'm not sure how to best handle this as far as recording the parse with Incise goes. The output of the Emacs shell call allows me to retrieve the file paths of every file that got written, so I can always retreive the meta-data from those files afterwards. I know that recording the parse data is strictly not necessary, but it would be nice to have it play nice with the way the other Incise parsers work. Especially since I think a possible use case would be to use Emacs to export org files into HTML, and then having Incises layouts act on those HTML files before publishing. This is actually the way the org-Jekyll setup works. Do you have any ideas how to handle this? I put up my code so far into a gist in case it helps.

https://gist.github.com/luxbock/8848158

Any other comments are welcome as well.

@RyanMcG
Copy link
Owner

RyanMcG commented Feb 6, 2014

The export/publishing functionality in org-mode is very powerful as it is, so exposing the user (with sane defaults of course) seems like the way to go.

Awesome!

Using org-publish-project exports any new or modified files, so this means that an org-parser that uses this function can potentially parse many files at once.

Incise supports generating many files from one, but not many to many (or one directory to another). A parser is invoked when it finds its extension on a file. Here is what I am thinking:

  • resources
    • content - Files parsed by incise
      • my-cool-project.el - A file containing config for the project including the content directory. which in this made up directory structure is resource/org-content. The trick is handling the publishing directory (:publishing-directory in the el file). If you follow my advice below, the publishing directory should be a tmp directory.
    • org-content - A file containing org source of a project

Now we have a single entry point to org project parsing. And the org content is not going to be parsed by incise directly. If you want to record the files generated by org-publish-project as parses and pass them through layouts then you'll have to have emacs generated the files into a temporary directory then read them in getting metadata all in step 1. Step 2 can pass the read in content through layouts and write the files into the output directory incise specifies.

@RyanMcG
Copy link
Owner

RyanMcG commented Feb 26, 2015

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants