+.
+
+````````````````````````````````
+
+
+Hard line breaks are for separating inline content within a block.
+Neither syntax for hard line breaks works at the end of a paragraph or
+other block element:
+
+```````````````````````````````` example
+foo\
+.
+foo\
+````````````````````````````````
+
+
+```````````````````````````````` example
+foo
+.
+foo
+````````````````````````````````
+
+
+```````````````````````````````` example
+### foo\
+.
+foo\
+````````````````````````````````
+
+
+```````````````````````````````` example
+### foo
+.
+foo
+````````````````````````````````
+
+
+## Soft line breaks
+
+A regular line ending (not in a code span or HTML tag) that is not
+preceded by two or more spaces or a backslash is parsed as a
+[softbreak](@). (A soft line break may be rendered in HTML either as a
+[line ending] or as a space. The result will be the same in
+browsers. In the examples here, a [line ending] will be used.)
+
+```````````````````````````````` example
+foo
+baz
+.
+foo
+baz
+````````````````````````````````
+
+
+Spaces at the end of the line and beginning of the next line are
+removed:
+
+```````````````````````````````` example
+foo
+ baz
+.
+foo
+baz
+````````````````````````````````
+
+
+A conforming parser may render a soft line break in HTML either as a
+line ending or as a space.
+
+A renderer may also provide an option to render soft line breaks
+as hard line breaks.
+
+## Textual content
+
+Any characters not given an interpretation by the above rules will
+be parsed as plain textual content.
+
+```````````````````````````````` example
+hello $.;'there
+.
+hello $.;'there
+````````````````````````````````
+
+
+```````````````````````````````` example
+Foo χρῆν
+.
+Foo χρῆν
+````````````````````````````````
+
+
+Internal spaces are preserved verbatim:
+
+```````````````````````````````` example
+Multiple spaces
+.
+Multiple spaces
+````````````````````````````````
+
+
+
+
+# Appendix: A parsing strategy
+
+In this appendix we describe some features of the parsing strategy
+used in the CommonMark reference implementations.
+
+## Overview
+
+Parsing has two phases:
+
+1. In the first phase, lines of input are consumed and the block
+structure of the document---its division into paragraphs, block quotes,
+list items, and so on---is constructed. Text is assigned to these
+blocks but not parsed. Link reference definitions are parsed and a
+map of links is constructed.
+
+2. In the second phase, the raw text contents of paragraphs and headings
+are parsed into sequences of Markdown inline elements (strings,
+code spans, links, emphasis, and so on), using the map of link
+references constructed in phase 1.
+
+At each point in processing, the document is represented as a tree of
+**blocks**. The root of the tree is a `document` block. The `document`
+may have any number of other blocks as **children**. These children
+may, in turn, have other blocks as children. The last child of a block
+is normally considered **open**, meaning that subsequent lines of input
+can alter its contents. (Blocks that are not open are **closed**.)
+Here, for example, is a possible document tree, with the open blocks
+marked by arrows:
+
+``` tree
+-> document
+ -> block_quote
+ paragraph
+ "Lorem ipsum dolor\nsit amet."
+ -> list (type=bullet tight=true bullet_char=-)
+ list_item
+ paragraph
+ "Qui *quodsi iracundia*"
+ -> list_item
+ -> paragraph
+ "aliquando id"
+```
+
+## Phase 1: block structure
+
+Each line that is processed has an effect on this tree. The line is
+analyzed and, depending on its contents, the document may be altered
+in one or more of the following ways:
+
+1. One or more open blocks may be closed.
+2. One or more new blocks may be created as children of the
+ last open block.
+3. Text may be added to the last (deepest) open block remaining
+ on the tree.
+
+Once a line has been incorporated into the tree in this way,
+it can be discarded, so input can be read in a stream.
+
+For each line, we follow this procedure:
+
+1. First we iterate through the open blocks, starting with the
+root document, and descending through last children down to the last
+open block. Each block imposes a condition that the line must satisfy
+if the block is to remain open. For example, a block quote requires a
+`>` character. A paragraph requires a non-blank line.
+In this phase we may match all or just some of the open
+blocks. But we cannot close unmatched blocks yet, because we may have a
+[lazy continuation line].
+
+2. Next, after consuming the continuation markers for existing
+blocks, we look for new block starts (e.g. `>` for a block quote).
+If we encounter a new block start, we close any blocks unmatched
+in step 1 before creating the new block as a child of the last
+matched container block.
+
+3. Finally, we look at the remainder of the line (after block
+markers like `>`, list markers, and indentation have been consumed).
+This is text that can be incorporated into the last open
+block (a paragraph, code block, heading, or raw HTML).
+
+Setext headings are formed when we see a line of a paragraph
+that is a [setext heading underline].
+
+Reference link definitions are detected when a paragraph is closed;
+the accumulated text lines are parsed to see if they begin with
+one or more reference link definitions. Any remainder becomes a
+normal paragraph.
+
+We can see how this works by considering how the tree above is
+generated by four lines of Markdown:
+
+``` markdown
+> Lorem ipsum dolor
+sit amet.
+> - Qui *quodsi iracundia*
+> - aliquando id
+```
+
+At the outset, our document model is just
+
+``` tree
+-> document
+```
+
+The first line of our text,
+
+``` markdown
+> Lorem ipsum dolor
+```
+
+causes a `block_quote` block to be created as a child of our
+open `document` block, and a `paragraph` block as a child of
+the `block_quote`. Then the text is added to the last open
+block, the `paragraph`:
+
+``` tree
+-> document
+ -> block_quote
+ -> paragraph
+ "Lorem ipsum dolor"
+```
+
+The next line,
+
+``` markdown
+sit amet.
+```
+
+is a "lazy continuation" of the open `paragraph`, so it gets added
+to the paragraph's text:
+
+``` tree
+-> document
+ -> block_quote
+ -> paragraph
+ "Lorem ipsum dolor\nsit amet."
+```
+
+The third line,
+
+``` markdown
+> - Qui *quodsi iracundia*
+```
+
+causes the `paragraph` block to be closed, and a new `list` block
+opened as a child of the `block_quote`. A `list_item` is also
+added as a child of the `list`, and a `paragraph` as a child of
+the `list_item`. The text is then added to the new `paragraph`:
+
+``` tree
+-> document
+ -> block_quote
+ paragraph
+ "Lorem ipsum dolor\nsit amet."
+ -> list (type=bullet tight=true bullet_char=-)
+ -> list_item
+ -> paragraph
+ "Qui *quodsi iracundia*"
+```
+
+The fourth line,
+
+``` markdown
+> - aliquando id
+```
+
+causes the `list_item` (and its child the `paragraph`) to be closed,
+and a new `list_item` opened up as child of the `list`. A `paragraph`
+is added as a child of the new `list_item`, to contain the text.
+We thus obtain the final tree:
+
+``` tree
+-> document
+ -> block_quote
+ paragraph
+ "Lorem ipsum dolor\nsit amet."
+ -> list (type=bullet tight=true bullet_char=-)
+ list_item
+ paragraph
+ "Qui *quodsi iracundia*"
+ -> list_item
+ -> paragraph
+ "aliquando id"
+```
+
+## Phase 2: inline structure
+
+Once all of the input has been parsed, all open blocks are closed.
+
+We then "walk the tree," visiting every node, and parse raw
+string contents of paragraphs and headings as inlines. At this
+point we have seen all the link reference definitions, so we can
+resolve reference links as we go.
+
+``` tree
+document
+ block_quote
+ paragraph
+ str "Lorem ipsum dolor"
+ softbreak
+ str "sit amet."
+ list (type=bullet tight=true bullet_char=-)
+ list_item
+ paragraph
+ str "Qui "
+ emph
+ str "quodsi iracundia"
+ list_item
+ paragraph
+ str "aliquando id"
+```
+
+Notice how the [line ending] in the first paragraph has
+been parsed as a `softbreak`, and the asterisks in the first list item
+have become an `emph`.
+
+### An algorithm for parsing nested emphasis and links
+
+By far the trickiest part of inline parsing is handling emphasis,
+strong emphasis, links, and images. This is done using the following
+algorithm.
+
+When we're parsing inlines and we hit either
+
+- a run of `*` or `_` characters, or
+- a `[` or `.
+
+The [delimiter stack] is a doubly linked list. Each
+element contains a pointer to a text node, plus information about
+
+- the type of delimiter (`[`, `![`, `*`, `_`)
+- the number of delimiters,
+- whether the delimiter is "active" (all are active to start), and
+- whether the delimiter is a potential opener, a potential closer,
+ or both (which depends on what sort of characters precede
+ and follow the delimiters).
+
+When we hit a `]` character, we call the *look for link or image*
+procedure (see below).
+
+When we hit the end of the input, we call the *process emphasis*
+procedure (see below), with `stack_bottom` = NULL.
+
+#### *look for link or image*
+
+Starting at the top of the delimiter stack, we look backwards
+through the stack for an opening `[` or `![` delimiter.
+
+- If we don't find one, we return a literal text node `]`.
+
+- If we do find one, but it's not *active*, we remove the inactive
+ delimiter from the stack, and return a literal text node `]`.
+
+- If we find one and it's active, then we parse ahead to see if
+ we have an inline link/image, reference link/image, collapsed reference
+ link/image, or shortcut reference link/image.
+
+ + If we don't, then we remove the opening delimiter from the
+ delimiter stack and return a literal text node `]`.
+
+ + If we do, then
+
+ * We return a link or image node whose children are the inlines
+ after the text node pointed to by the opening delimiter.
+
+ * We run *process emphasis* on these inlines, with the `[` opener
+ as `stack_bottom`.
+
+ * We remove the opening delimiter.
+
+ * If we have a link (and not an image), we also set all
+ `[` delimiters before the opening delimiter to *inactive*. (This
+ will prevent us from getting links within links.)
+
+#### *process emphasis*
+
+Parameter `stack_bottom` sets a lower bound to how far we
+descend in the [delimiter stack]. If it is NULL, we can
+go all the way to the bottom. Otherwise, we stop before
+visiting `stack_bottom`.
+
+Let `current_position` point to the element on the [delimiter stack]
+just above `stack_bottom` (or the first element if `stack_bottom`
+is NULL).
+
+We keep track of the `openers_bottom` for each delimiter
+type (`*`, `_`), indexed to the length of the closing delimiter run
+(modulo 3) and to whether the closing delimiter can also be an
+opener. Initialize this to `stack_bottom`.
+
+Then we repeat the following until we run out of potential
+closers:
+
+- Move `current_position` forward in the delimiter stack (if needed)
+ until we find the first potential closer with delimiter `*` or `_`.
+ (This will be the potential closer closest
+ to the beginning of the input -- the first one in parse order.)
+
+- Now, look back in the stack (staying above `stack_bottom` and
+ the `openers_bottom` for this delimiter type) for the
+ first matching potential opener ("matching" means same delimiter).
+
+- If one is found:
+
+ + Figure out whether we have emphasis or strong emphasis:
+ if both closer and opener spans have length >= 2, we have
+ strong, otherwise regular.
+
+ + Insert an emph or strong emph node accordingly, after
+ the text node corresponding to the opener.
+
+ + Remove any delimiters between the opener and closer from
+ the delimiter stack.
+
+ + Remove 1 (for regular emph) or 2 (for strong emph) delimiters
+ from the opening and closing text nodes. If they become empty
+ as a result, remove them and remove the corresponding element
+ of the delimiter stack. If the closing node is removed, reset
+ `current_position` to the next element in the stack.
+
+- If none is found:
+
+ + Set `openers_bottom` to the element before `current_position`.
+ (We know that there are no openers for this kind of closer up to and
+ including this point, so this puts a lower bound on future searches.)
+
+ + If the closer at `current_position` is not a potential opener,
+ remove it from the delimiter stack (since we know it can't
+ be a closer either).
+
+ + Advance `current_position` to the next element in the stack.
+
+After we're done, we remove all delimiters above `stack_bottom` from the
+delimiter stack.
diff --git a/commonmark/pom.xml b/commonmark/pom.xml
index be18858ad..4e060edaa 100644
--- a/commonmark/pom.xml
+++ b/commonmark/pom.xml
@@ -2,19 +2,19 @@
4.0.0
- com.atlassian.commonmark
+ org.commonmark
commonmark-parent
- 0.1.1-SNAPSHOT
+ 0.28.1-SNAPSHOT
commonmark
commonmark-java core
- Core of commonmark-java
+ Core of commonmark-java (a library for parsing Markdown to an AST, modifying the AST and rendering it to HTML or Markdown)
- junit
- junit
+ org.commonmark
+ commonmark-test-util
test
@@ -29,21 +29,37 @@
-
-
-
- org.apache.maven.plugins
- maven-jar-plugin
- 2.6
-
-
-
- test-jar
-
-
-
-
-
-
+
+
+ benchmark
+
+ exec:exec
+
+
+ org.codehaus.mojo
+ exec-maven-plugin
+ 3.2.0
+
+ java
+ test
+
+ -classpath
+
+ org.commonmark.test.SpecBenchmark
+
+
+
+
+
+
+
+
+
+
+ BSD-2-Clause
+ https://opensource.org/licenses/BSD-2-Clause
+ repo
+
+
diff --git a/commonmark/src/main/java/module-info.java b/commonmark/src/main/java/module-info.java
new file mode 100644
index 000000000..009fc7d18
--- /dev/null
+++ b/commonmark/src/main/java/module-info.java
@@ -0,0 +1,13 @@
+module org.commonmark {
+ exports org.commonmark;
+ exports org.commonmark.node;
+ exports org.commonmark.parser;
+ exports org.commonmark.parser.beta;
+ exports org.commonmark.parser.block;
+ exports org.commonmark.parser.delimiter;
+ exports org.commonmark.renderer;
+ exports org.commonmark.renderer.html;
+ exports org.commonmark.renderer.markdown;
+ exports org.commonmark.renderer.text;
+ exports org.commonmark.text;
+}
diff --git a/commonmark/src/main/java/org/commonmark/html/AttributeProvider.java b/commonmark/src/main/java/org/commonmark/html/AttributeProvider.java
deleted file mode 100644
index e5f62365d..000000000
--- a/commonmark/src/main/java/org/commonmark/html/AttributeProvider.java
+++ /dev/null
@@ -1,25 +0,0 @@
-package org.commonmark.html;
-
-import org.commonmark.node.Node;
-
-import java.util.Map;
-
-/**
- * Extension point for adding/changing attributes on the primary HTML tag for a node.
- */
-public interface AttributeProvider {
-
- /**
- * Set the attributes for the node by modifying the provided map.
- *
- * This allows to change or even remove default attributes. With great power comes great responsibility.
- *
- * The attribute key and values will be escaped (preserving character entities), so don't escape them here,
- * otherwise they will be double-escaped.
- *
- * @param node the node to set attributes for
- * @param attributes the attributes, with any default attributes already set in the map
- */
- void setAttributes(Node node, Map attributes);
-
-}
diff --git a/commonmark/src/main/java/org/commonmark/html/CustomHtmlRenderer.java b/commonmark/src/main/java/org/commonmark/html/CustomHtmlRenderer.java
deleted file mode 100644
index cf414a35e..000000000
--- a/commonmark/src/main/java/org/commonmark/html/CustomHtmlRenderer.java
+++ /dev/null
@@ -1,10 +0,0 @@
-package org.commonmark.html;
-
-import org.commonmark.node.Node;
-import org.commonmark.node.Visitor;
-
-public interface CustomHtmlRenderer {
- // TODO: maybe pass renderer instead of visitor?
- boolean render(Node node, HtmlWriter htmlWriter, Visitor visitor);
-}
-
diff --git a/commonmark/src/main/java/org/commonmark/html/HtmlRenderer.java b/commonmark/src/main/java/org/commonmark/html/HtmlRenderer.java
deleted file mode 100644
index ce001a451..000000000
--- a/commonmark/src/main/java/org/commonmark/html/HtmlRenderer.java
+++ /dev/null
@@ -1,398 +0,0 @@
-package org.commonmark.html;
-
-import org.commonmark.Extension;
-import org.commonmark.internal.util.Escaping;
-import org.commonmark.node.*;
-
-import java.util.*;
-
-public class HtmlRenderer {
-
- private static final Map NO_ATTRIBUTES = Collections.emptyMap();
-
- private final String softbreak;
- private final boolean escapeHtml;
- private final boolean percentEncodeUrls;
- private final List customHtmlRenderers;
- private final List attributeProviders;
-
- private HtmlRenderer(Builder builder) {
- this.softbreak = builder.softbreak;
- this.escapeHtml = builder.escapeHtml;
- this.percentEncodeUrls = builder.percentEncodeUrls;
- this.customHtmlRenderers = builder.customHtmlRenderers;
- this.attributeProviders = builder.attributeProviders;
- }
-
- public static Builder builder() {
- return new Builder();
- }
-
- public void render(Node node, Appendable output) {
- RendererVisitor rendererVisitor = new RendererVisitor(new HtmlWriter(output), customHtmlRenderers);
- node.accept(rendererVisitor);
- }
-
- public String render(Node node) {
- StringBuilder sb = new StringBuilder();
- render(node, sb);
- return sb.toString();
- }
-
- private String escape(String input, boolean preserveEntities) {
- return Escaping.escapeHtml(input, preserveEntities);
- }
-
- private String optionallyPercentEncodeUrl(String url) {
- if (percentEncodeUrls) {
- return Escaping.percentEncodeUrl(url);
- } else {
- return url;
- }
- }
-
- // default options:
- // softbreak: '\n', // by default, soft breaks are rendered as newlines in
- // HTML
- // set to "
" to make them hard breaks
- // set to " " if you want to ignore line wrapping in source
- public static class Builder {
-
- private String softbreak = "\n";
- private boolean escapeHtml = false;
- private boolean percentEncodeUrls = false;
- private List customHtmlRenderers = new ArrayList<>();
- private List attributeProviders = new ArrayList<>();
-
- public Builder softbreak(String softbreak) {
- this.softbreak = softbreak;
- return this;
- }
-
- /**
- * Whether {@link HtmlTag} and {@link HtmlBlock} should be escaped.
- *
- * Note that {@link HtmlTag} is only a tag itself, not the text between an opening tag and a closing tag. So markup
- * in the text will be parsed as normal and is not affected by this option.
- *
- * @param escapeHtml true for escaping, false for preserving raw HTML
- * @return {@code this}
- */
- public Builder escapeHtml(boolean escapeHtml) {
- this.escapeHtml = escapeHtml;
- return this;
- }
-
- /**
- * Whether URLs of link or images should be percent-encoded. If enabled, the following is done:
- *
- * - Existing percent-encoded parts are preserved (e.g. "%20" is kept as "%20")
- * - Reserved characters such as "/" are preserved, except for "[" and "]" (see encodeURI in JS)
- * - Unreserved characters such as "a" are preserved
- * - Other characters such umlauts are percent-encoded
- *
- *
- * @param percentEncodeUrls true to percent-encode, false for leaving as-is; default is false
- * @return {@code this}
- */
- public Builder percentEncodeUrls(boolean percentEncodeUrls) {
- this.percentEncodeUrls = percentEncodeUrls;
- return this;
- }
-
- public Builder attributeProvider(AttributeProvider attributeProvider) {
- this.attributeProviders.add(attributeProvider);
- return this;
- }
-
- public Builder customHtmlRenderer(CustomHtmlRenderer customHtmlRenderer) {
- this.customHtmlRenderers.add(customHtmlRenderer);
- return this;
- }
-
- /**
- * @param extensions extensions to use on this HTML renderer
- * @return this
- */
- public Builder extensions(Iterable extends Extension> extensions) {
- for (Extension extension : extensions) {
- if (extension instanceof HtmlRendererExtension) {
- HtmlRendererExtension htmlRendererExtension = (HtmlRendererExtension) extension;
- htmlRendererExtension.extend(this);
- }
- }
- return this;
- }
-
- public HtmlRenderer build() {
- return new HtmlRenderer(this);
- }
- }
-
- /**
- * Extension for HTML renderer.
- */
- public interface HtmlRendererExtension extends Extension {
- void extend(Builder rendererBuilder);
- }
-
- private class RendererVisitor extends AbstractVisitor {
-
- private final HtmlWriter html;
- private final List customHtmlRenderers;
-
- public RendererVisitor(HtmlWriter html, List customHtmlRenderers) {
- this.html = html;
- this.customHtmlRenderers = customHtmlRenderers;
- }
-
- @Override
- public void visit(Document document) {
- visitChildren(document);
- }
-
- @Override
- public void visit(Header header) {
- String htag = "h" + header.getLevel();
- html.line();
- html.tag(htag, getAttrs(header));
- visitChildren(header);
- html.tag('/' + htag);
- html.line();
- }
-
- @Override
- public void visit(Paragraph paragraph) {
- boolean inTightList = isInTightList(paragraph);
- if (!inTightList) {
- html.line();
- html.tag("p", getAttrs(paragraph));
- }
- visitChildren(paragraph);
- if (!inTightList) {
- html.tag("/p");
- html.line();
- }
- }
-
- @Override
- public void visit(BlockQuote blockQuote) {
- html.line();
- html.tag("blockquote", getAttrs(blockQuote));
- html.line();
- visitChildren(blockQuote);
- html.line();
- html.tag("/blockquote");
- html.line();
- }
-
- @Override
- public void visit(BulletList bulletList) {
- renderListBlock(bulletList, "ul", getAttrs(bulletList));
- }
-
- @Override
- public void visit(FencedCodeBlock fencedCodeBlock) {
- String literal = fencedCodeBlock.getLiteral();
- Map attributes = new LinkedHashMap<>();
- String info = fencedCodeBlock.getInfo();
- if (info != null && !info.isEmpty()) {
- int space = info.indexOf(" ");
- String language;
- if (space == -1) {
- language = info;
- } else {
- language = info.substring(0, space);
- }
- attributes.put("class", "language-" + language);
- }
- renderCodeBlock(literal, getAttrs(fencedCodeBlock, attributes));
- }
-
- @Override
- public void visit(HtmlBlock htmlBlock) {
- html.line();
- if (escapeHtml) {
- html.raw(escape(htmlBlock.getLiteral(), false));
- } else {
- html.raw(htmlBlock.getLiteral());
- }
- html.line();
- }
-
- @Override
- public void visit(HorizontalRule horizontalRule) {
- html.line();
- html.tag("hr", getAttrs(horizontalRule), true);
- html.line();
- }
-
- @Override
- public void visit(IndentedCodeBlock indentedCodeBlock) {
- renderCodeBlock(indentedCodeBlock.getLiteral(), getAttrs(indentedCodeBlock));
- }
-
- @Override
- public void visit(Link link) {
- Map attrs = new LinkedHashMap<>();
- String url = optionallyPercentEncodeUrl(link.getDestination());
- attrs.put("href", url);
- if (link.getTitle() != null) {
- attrs.put("title", link.getTitle());
- }
- html.tag("a", getAttrs(link, attrs));
- visitChildren(link);
- html.tag("/a");
- }
-
- @Override
- public void visit(ListItem listItem) {
- html.tag("li", getAttrs(listItem));
- visitChildren(listItem);
- html.tag("/li");
- html.line();
- }
-
- @Override
- public void visit(OrderedList orderedList) {
- int start = orderedList.getStartNumber();
- Map attrs = new LinkedHashMap<>();
- if (start != 1) {
- attrs.put("start", String.valueOf(start));
- }
- renderListBlock(orderedList, "ol", getAttrs(orderedList, attrs));
- }
-
- @Override
- public void visit(Image image) {
- if (html.isTagAllowed()) {
- String url = optionallyPercentEncodeUrl(image.getDestination());
- html.raw("
");
- }
- }
-
- @Override
- public void visit(Emphasis emphasis) {
- html.tag("em");
- visitChildren(emphasis);
- html.tag("/em");
- }
-
- @Override
- public void visit(StrongEmphasis strongEmphasis) {
- html.tag("strong");
- visitChildren(strongEmphasis);
- html.tag("/strong");
- }
-
- @Override
- public void visit(Text text) {
- html.raw(escape(text.getLiteral(), false));
- }
-
- @Override
- public void visit(Code code) {
- html.tag("code");
- html.raw(escape(code.getLiteral(), false));
- html.tag("/code");
- }
-
- @Override
- public void visit(HtmlTag htmlTag) {
- if (escapeHtml) {
- html.raw(escape(htmlTag.getLiteral(), false));
- } else {
- html.raw(htmlTag.getLiteral());
- }
- }
-
- @Override
- public void visit(SoftLineBreak softLineBreak) {
- html.raw(softbreak);
- }
-
- @Override
- public void visit(HardLineBreak hardLineBreak) {
- html.tag("br", NO_ATTRIBUTES, true);
- html.line();
- }
-
- @Override
- public void visit(CustomBlock customBlock) {
- renderCustom(customBlock);
- }
-
- @Override
- public void visit(CustomNode customNode) {
- renderCustom(customNode);
- }
-
- private void renderCustom(Node node) {
- for (CustomHtmlRenderer customHtmlRenderer : customHtmlRenderers) {
- // TODO: Should we pass attributes here?
- boolean handled = customHtmlRenderer.render(node, html, this);
- if (handled) {
- break;
- }
- }
- }
-
- private void renderCodeBlock(String literal, Map attributes) {
- html.line();
- html.tag("pre");
- html.tag("code", attributes);
- html.raw(escape(literal, false));
- html.tag("/code");
- html.tag("/pre");
- html.line();
- }
-
- private void renderListBlock(ListBlock listBlock, String tagName, Map attributes) {
- html.line();
- html.tag(tagName, attributes);
- html.line();
- visitChildren(listBlock);
- html.line();
- html.tag('/' + tagName);
- html.line();
- }
-
- private boolean isInTightList(Paragraph paragraph) {
- Node parent = paragraph.getParent();
- if (parent != null) {
- Node gramps = parent.getParent();
- if (gramps != null && gramps instanceof ListBlock) {
- ListBlock list = (ListBlock) gramps;
- return list.isTight();
- }
- }
- return false;
- }
-
- private Map getAttrs(Node node) {
- return getAttrs(node, Collections.emptyMap());
- }
-
- private Map getAttrs(Node node, Map defaultAttributes) {
- Map attrs = new LinkedHashMap<>(defaultAttributes);
- setCustomAttributes(node, attrs);
- return attrs;
- }
-
- private void setCustomAttributes(Node node, Map attrs) {
- for (AttributeProvider attributeProvider : attributeProviders) {
- attributeProvider.setAttributes(node, attrs);
- }
- }
- }
-}
diff --git a/commonmark/src/main/java/org/commonmark/internal/BlockContent.java b/commonmark/src/main/java/org/commonmark/internal/BlockContent.java
index f278c20c0..9a9ce6f44 100644
--- a/commonmark/src/main/java/org/commonmark/internal/BlockContent.java
+++ b/commonmark/src/main/java/org/commonmark/internal/BlockContent.java
@@ -22,10 +22,6 @@ public void add(CharSequence line) {
lineCount++;
}
- public boolean hasSingleLine() {
- return lineCount == 1;
- }
-
public String getString() {
return sb.toString();
}
diff --git a/commonmark/src/main/java/org/commonmark/internal/BlockQuoteParser.java b/commonmark/src/main/java/org/commonmark/internal/BlockQuoteParser.java
index 247af08cc..572c491f8 100644
--- a/commonmark/src/main/java/org/commonmark/internal/BlockQuoteParser.java
+++ b/commonmark/src/main/java/org/commonmark/internal/BlockQuoteParser.java
@@ -1,8 +1,10 @@
package org.commonmark.internal;
+import org.commonmark.internal.util.Parsing;
import org.commonmark.node.Block;
import org.commonmark.node.BlockQuote;
import org.commonmark.parser.block.*;
+import org.commonmark.text.Characters;
public class BlockQuoteParser extends AbstractBlockParser {
@@ -26,29 +28,34 @@ public BlockQuote getBlock() {
@Override
public BlockContinue tryContinue(ParserState state) {
int nextNonSpace = state.getNextNonSpaceIndex();
- CharSequence line = state.getLine();
- if (state.getIndent() <= 3 && nextNonSpace < line.length() && line.charAt(nextNonSpace) == '>') {
- int newIndex = nextNonSpace + 1;
- if (newIndex < line.length() && line.charAt(newIndex) == ' ') {
- newIndex++;
+ if (isMarker(state, nextNonSpace)) {
+ int newColumn = state.getColumn() + state.getIndent() + 1;
+ // optional following space or tab
+ if (Characters.isSpaceOrTab(state.getLine().getContent(), nextNonSpace + 1)) {
+ newColumn++;
}
- return BlockContinue.atIndex(newIndex);
+ return BlockContinue.atColumn(newColumn);
} else {
return BlockContinue.none();
}
}
+ private static boolean isMarker(ParserState state, int index) {
+ CharSequence line = state.getLine().getContent();
+ return state.getIndent() < Parsing.CODE_BLOCK_INDENT && index < line.length() && line.charAt(index) == '>';
+ }
+
public static class Factory extends AbstractBlockParserFactory {
+ @Override
public BlockStart tryStart(ParserState state, MatchedBlockParser matchedBlockParser) {
- CharSequence line = state.getLine();
int nextNonSpace = state.getNextNonSpaceIndex();
- if (state.getIndent() < 4 && line.charAt(nextNonSpace) == '>') {
- int newOffset = nextNonSpace + 1;
- // optional following space
- if (newOffset < line.length() && line.charAt(newOffset) == ' ') {
- newOffset++;
+ if (isMarker(state, nextNonSpace)) {
+ int newColumn = state.getColumn() + state.getIndent() + 1;
+ // optional following space or tab
+ if (Characters.isSpaceOrTab(state.getLine().getContent(), nextNonSpace + 1)) {
+ newColumn++;
}
- return BlockStart.of(new BlockQuoteParser()).atIndex(newOffset);
+ return BlockStart.of(new BlockQuoteParser()).atColumn(newColumn);
} else {
return BlockStart.none();
}
diff --git a/commonmark/src/main/java/org/commonmark/internal/BlockStartImpl.java b/commonmark/src/main/java/org/commonmark/internal/BlockStartImpl.java
index c7e967d46..516f944b2 100644
--- a/commonmark/src/main/java/org/commonmark/internal/BlockStartImpl.java
+++ b/commonmark/src/main/java/org/commonmark/internal/BlockStartImpl.java
@@ -9,6 +9,7 @@ public class BlockStartImpl extends BlockStart {
private int newIndex = -1;
private int newColumn = -1;
private boolean replaceActiveBlockParser = false;
+ private int replaceParagraphLines = 0;
public BlockStartImpl(BlockParser... blockParsers) {
this.blockParsers = blockParsers;
@@ -30,6 +31,10 @@ public boolean isReplaceActiveBlockParser() {
return replaceActiveBlockParser;
}
+ int getReplaceParagraphLines() {
+ return replaceParagraphLines;
+ }
+
@Override
public BlockStart atIndex(int newIndex) {
this.newIndex = newIndex;
@@ -48,4 +53,12 @@ public BlockStart replaceActiveBlockParser() {
return this;
}
+ @Override
+ public BlockStart replaceParagraphLines(int lines) {
+ if (!(lines >= 1)) {
+ throw new IllegalArgumentException("Lines must be >= 1");
+ }
+ this.replaceParagraphLines = lines;
+ return this;
+ }
}
diff --git a/commonmark/src/main/java/org/commonmark/internal/Bracket.java b/commonmark/src/main/java/org/commonmark/internal/Bracket.java
new file mode 100644
index 000000000..c04b6ecda
--- /dev/null
+++ b/commonmark/src/main/java/org/commonmark/internal/Bracket.java
@@ -0,0 +1,73 @@
+package org.commonmark.internal;
+
+import org.commonmark.node.Text;
+import org.commonmark.parser.beta.Position;
+
+/**
+ * Opening bracket for links ({@code [}), images ({@code ![}), or links with other markers.
+ */
+public class Bracket {
+
+ /**
+ * The node of a marker such as {@code !} if present, null otherwise.
+ */
+ public final Text markerNode;
+
+ /**
+ * The position of the marker if present, null otherwise.
+ */
+ public final Position markerPosition;
+
+ /**
+ * The node of {@code [}.
+ */
+ public final Text bracketNode;
+
+ /**
+ * The position of {@code [}.
+ */
+ public final Position bracketPosition;
+
+ /**
+ * The position of the content (after the opening bracket)
+ */
+ public final Position contentPosition;
+
+ /**
+ * Previous bracket.
+ */
+ public final Bracket previous;
+
+ /**
+ * Previous delimiter (emphasis, etc) before this bracket.
+ */
+ public final Delimiter previousDelimiter;
+
+ /**
+ * Whether this bracket is allowed to form a link/image (also known as "active").
+ */
+ public boolean allowed = true;
+
+ /**
+ * Whether there is an unescaped bracket (opening or closing) after this opening bracket in the text parsed so far.
+ */
+ public boolean bracketAfter = false;
+
+ static public Bracket link(Text bracketNode, Position bracketPosition, Position contentPosition, Bracket previous, Delimiter previousDelimiter) {
+ return new Bracket(null, null, bracketNode, bracketPosition, contentPosition, previous, previousDelimiter);
+ }
+
+ static public Bracket withMarker(Text markerNode, Position markerPosition, Text bracketNode, Position bracketPosition, Position contentPosition, Bracket previous, Delimiter previousDelimiter) {
+ return new Bracket(markerNode, markerPosition, bracketNode, bracketPosition, contentPosition, previous, previousDelimiter);
+ }
+
+ private Bracket(Text markerNode, Position markerPosition, Text bracketNode, Position bracketPosition, Position contentPosition, Bracket previous, Delimiter previousDelimiter) {
+ this.markerNode = markerNode;
+ this.markerPosition = markerPosition;
+ this.bracketNode = bracketNode;
+ this.bracketPosition = bracketPosition;
+ this.contentPosition = contentPosition;
+ this.previous = previous;
+ this.previousDelimiter = previousDelimiter;
+ }
+}
diff --git a/commonmark/src/main/java/org/commonmark/internal/Definitions.java b/commonmark/src/main/java/org/commonmark/internal/Definitions.java
new file mode 100644
index 000000000..0377842c9
--- /dev/null
+++ b/commonmark/src/main/java/org/commonmark/internal/Definitions.java
@@ -0,0 +1,33 @@
+package org.commonmark.internal;
+
+import org.commonmark.node.DefinitionMap;
+
+import java.util.HashMap;
+import java.util.Map;
+
+public class Definitions {
+
+ private final Map, DefinitionMap>> definitionsByType = new HashMap<>();
+
+ public void addDefinitions(DefinitionMap definitionMap) {
+ var existingMap = getMap(definitionMap.getType());
+ if (existingMap == null) {
+ definitionsByType.put(definitionMap.getType(), definitionMap);
+ } else {
+ existingMap.addAll(definitionMap);
+ }
+ }
+
+ public V getDefinition(Class type, String label) {
+ var definitionMap = getMap(type);
+ if (definitionMap == null) {
+ return null;
+ }
+ return definitionMap.get(label);
+ }
+
+ private DefinitionMap getMap(Class type) {
+ //noinspection unchecked
+ return (DefinitionMap) definitionsByType.get(type);
+ }
+}
diff --git a/commonmark/src/main/java/org/commonmark/internal/Delimiter.java b/commonmark/src/main/java/org/commonmark/internal/Delimiter.java
index 127a834b5..9083ce3cb 100644
--- a/commonmark/src/main/java/org/commonmark/internal/Delimiter.java
+++ b/commonmark/src/main/java/org/commonmark/internal/Delimiter.java
@@ -1,61 +1,82 @@
package org.commonmark.internal;
-import org.commonmark.node.Node;
import org.commonmark.node.Text;
+import org.commonmark.parser.delimiter.DelimiterRun;
-class Delimiter {
+import java.util.List;
- final Text node;
- final int index;
+/**
+ * Delimiter (emphasis, strong emphasis or custom emphasis).
+ */
+public class Delimiter implements DelimiterRun {
- Delimiter previous;
- Delimiter next;
+ public final List characters;
+ public final char delimiterChar;
+ private final int originalLength;
- char delimiterChar;
- int numDelims = 1;
+ // Can open emphasis, see spec.
+ private final boolean canOpen;
- /**
- * Can open emphasis, see spec.
- */
- boolean canOpen = true;
+ // Can close emphasis, see spec.
+ private final boolean canClose;
- /**
- * Can close emphasis, see spec.
- */
- boolean canClose = false;
+ public Delimiter previous;
+ public Delimiter next;
- /**
- * Whether this delimiter is allowed to form a link/image.
- */
- boolean allowed = true;
+ public Delimiter(List characters, char delimiterChar, boolean canOpen, boolean canClose, Delimiter previous) {
+ this.characters = characters;
+ this.delimiterChar = delimiterChar;
+ this.canOpen = canOpen;
+ this.canClose = canClose;
+ this.previous = previous;
+ this.originalLength = characters.size();
+ }
- /**
- * Skip this delimiter when looking for a link/image opener because it was already matched.
- */
- boolean matched = false;
+ @Override
+ public boolean canOpen() {
+ return canOpen;
+ }
- Delimiter(Text node, Delimiter previous, int index) {
- this.node = node;
- this.previous = previous;
- this.index = index;
+ @Override
+ public boolean canClose() {
+ return canClose;
}
- Text getPreviousNonDelimiterTextNode() {
- Node previousNode = node.getPrevious();
- if (previousNode instanceof Text && (this.previous == null || this.previous.node != previousNode)) {
- return (Text) previousNode;
- } else {
- return null;
- }
+ @Override
+ public int length() {
+ return characters.size();
+ }
+
+ @Override
+ public int originalLength() {
+ return originalLength;
}
- Text getNextNonDelimiterTextNode() {
- Node nextNode = node.getNext();
- if (nextNode instanceof Text && (this.next == null || this.next.node != nextNode)) {
- return (Text) nextNode;
- } else {
- return null;
+ @Override
+ public Text getOpener() {
+ return characters.get(characters.size() - 1);
+ }
+
+ @Override
+ public Text getCloser() {
+ return characters.get(0);
+ }
+
+ @Override
+ public Iterable getOpeners(int length) {
+ if (!(length >= 1 && length <= length())) {
+ throw new IllegalArgumentException("length must be between 1 and " + length() + ", was " + length);
}
+
+ return characters.subList(characters.size() - length, characters.size());
}
+ @Override
+ public Iterable getClosers(int length) {
+ if (!(length >= 1 && length <= length())) {
+ throw new IllegalArgumentException("length must be between 1 and " + length() + ", was " + length);
+ }
+
+ return characters.subList(0, length);
+ }
}
diff --git a/commonmark/src/main/java/org/commonmark/internal/DelimiterRun.java b/commonmark/src/main/java/org/commonmark/internal/DelimiterRun.java
deleted file mode 100644
index a8a363fa8..000000000
--- a/commonmark/src/main/java/org/commonmark/internal/DelimiterRun.java
+++ /dev/null
@@ -1,15 +0,0 @@
-package org.commonmark.internal;
-
-class DelimiterRun {
-
- final int count;
- final boolean canClose;
- final boolean canOpen;
-
- DelimiterRun(int count, boolean canOpen, boolean canClose) {
- this.count = count;
- this.canOpen = canOpen;
- this.canClose = canClose;
- }
-
-}
diff --git a/commonmark/src/main/java/org/commonmark/internal/DocumentBlockParser.java b/commonmark/src/main/java/org/commonmark/internal/DocumentBlockParser.java
index 4a30544e7..db3d3854f 100644
--- a/commonmark/src/main/java/org/commonmark/internal/DocumentBlockParser.java
+++ b/commonmark/src/main/java/org/commonmark/internal/DocumentBlockParser.java
@@ -2,6 +2,7 @@
import org.commonmark.node.Block;
import org.commonmark.node.Document;
+import org.commonmark.parser.SourceLine;
import org.commonmark.parser.block.AbstractBlockParser;
import org.commonmark.parser.block.BlockContinue;
import org.commonmark.parser.block.ParserState;
@@ -31,7 +32,7 @@ public BlockContinue tryContinue(ParserState state) {
}
@Override
- public void addLine(CharSequence line) {
+ public void addLine(SourceLine line) {
}
}
diff --git a/commonmark/src/main/java/org/commonmark/internal/DocumentParser.java b/commonmark/src/main/java/org/commonmark/internal/DocumentParser.java
index aeab876d4..07d97296b 100644
--- a/commonmark/src/main/java/org/commonmark/internal/DocumentParser.java
+++ b/commonmark/src/main/java/org/commonmark/internal/DocumentParser.java
@@ -1,27 +1,53 @@
package org.commonmark.internal;
-import java.io.BufferedReader;
-import java.io.IOException;
-import java.io.Reader;
+import org.commonmark.internal.util.LineReader;
import org.commonmark.internal.util.Parsing;
-import org.commonmark.internal.util.Substring;
import org.commonmark.node.*;
+import org.commonmark.parser.IncludeSourceSpans;
+import org.commonmark.parser.InlineParserFactory;
+import org.commonmark.parser.SourceLine;
+import org.commonmark.parser.SourceLines;
+import org.commonmark.parser.beta.LinkProcessor;
+import org.commonmark.parser.beta.InlineContentParserFactory;
import org.commonmark.parser.block.*;
+import org.commonmark.parser.delimiter.DelimiterProcessor;
+import org.commonmark.text.Characters;
+import java.io.IOException;
+import java.io.Reader;
import java.util.*;
public class DocumentParser implements ParserState {
- private static List CORE_FACTORIES = Arrays.asList(
- new BlockQuoteParser.Factory(),
- new HeaderParser.Factory(),
- new FencedCodeBlockParser.Factory(),
- new HtmlBlockParser.Factory(),
- new HorizontalRuleParser.Factory(),
- new ListBlockParser.Factory(),
- new IndentedCodeBlockParser.Factory());
+ private static final Set> CORE_FACTORY_TYPES = new LinkedHashSet<>(List.of(
+ BlockQuote.class,
+ Heading.class,
+ FencedCodeBlock.class,
+ HtmlBlock.class,
+ ThematicBreak.class,
+ ListBlock.class,
+ IndentedCodeBlock.class));
- private CharSequence line;
+ private static final Map, BlockParserFactory> NODES_TO_CORE_FACTORIES;
+
+ static {
+ Map, BlockParserFactory> map = new HashMap<>();
+ map.put(BlockQuote.class, new BlockQuoteParser.Factory());
+ map.put(Heading.class, new HeadingParser.Factory());
+ map.put(FencedCodeBlock.class, new FencedCodeBlockParser.Factory());
+ map.put(HtmlBlock.class, new HtmlBlockParser.Factory());
+ map.put(ThematicBreak.class, new ThematicBreakParser.Factory());
+ map.put(ListBlock.class, new ListBlockParser.Factory());
+ map.put(IndentedCodeBlock.class, new IndentedCodeBlockParser.Factory());
+ NODES_TO_CORE_FACTORIES = Collections.unmodifiableMap(map);
+ }
+
+ private SourceLine line;
+
+ /**
+ * Line index (0-based)
+ */
+ private int lineIndex = -1;
/**
* current index (offset) in input line (0-based)
@@ -33,75 +59,110 @@ public class DocumentParser implements ParserState {
*/
private int column = 0;
+ /**
+ * if the current column is within a tab character (partially consumed tab)
+ */
+ private boolean columnIsInTab;
+
private int nextNonSpace = 0;
private int nextNonSpaceColumn = 0;
- private boolean blank;
-
private int indent = 0;
+ private boolean blank;
private final List blockParserFactories;
- private final InlineParserImpl inlineParser;
+ private final InlineParserFactory inlineParserFactory;
+ private final List inlineContentParserFactories;
+ private final List delimiterProcessors;
+ private final List linkProcessors;
+ private final Set linkMarkers;
+ private final IncludeSourceSpans includeSourceSpans;
+ private final int maxOpenBlockParsers;
private final DocumentBlockParser documentBlockParser;
+ private final Definitions definitions = new Definitions();
- private List activeBlockParsers = new ArrayList<>();
- private Set allBlockParsers = new HashSet<>();
- private Map lastLineBlank = new HashMap<>();
+ private final List openBlockParsers = new ArrayList<>();
+ private final List allBlockParsers = new ArrayList<>();
- public DocumentParser(List blockParserFactories, InlineParserImpl inlineParser) {
+ public DocumentParser(List blockParserFactories, InlineParserFactory inlineParserFactory,
+ List inlineContentParserFactories, List delimiterProcessors,
+ List