Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#2900 closed defect (bug) (fixed)

incorrect parsing of markdown in @param blocks

Reported by: pbiron's profile pbiron Owned by: coffee2code's profile coffee2code
Milestone: Priority: normal
Component: Developer Hub Keywords: has-screenshots


There is a problem with phpdoc-parser incorrectly parsing some markdown in DocBlocs.

For example, see the $name__like @param in the hash of $query in WP_Term_Query::__construct().

The problem is actually in vendor/erusev/parsedown/Parsedown.php, rather than in phpdoc-parser, itself.

I just submitted a Pull Request against parsedown with a fix. If/when they accept/merge that PR, then DevHub should pull it and rebuild everything in the Code Reference.

If they don't accept/merge (or take a long time to do so), then a short term workaround is to modify the hash in all DocBlocs that use this pattern to be like:

 * ...
 *    @type string $name\__like  Retrieve terms with criteria by which a term is LIKE `$name\__like.
 * ...

i.e., add escapes to all occurrences of __ if when it appears twice in the same @param when the occurrence is not supposed indicate strong.

Attachments (1)

markdown-parsing-error.png (15.6 KB) - added by pbiron 7 years ago.
screenshot of WP_Term_Query::construct()

Download all attachments as: .zip

Change History (16)

7 years ago

screenshot of WP_Term_Query::construct()

#1 @pbiron
7 years ago

  • Keywords has-screenshots added

#2 @DrewAPicture
7 years ago

We already partially handle for this case in the \DevHub\get_params() function in the wporg-developer theme.

Doesn't make it any less of a problem, but we have already attempted to address this on DevHub. Though I notice that we're doing a str_replace() against $tag which is an array – should be $tag['content'].

Anyway, the problem in this particular case isn't actually bolding, it looks like the closing </code> tag isn't getting parsed properly from the closing backtick in the description for $name__like , which actually causes everything below it to be <code> formatted.

Last edited 7 years ago by DrewAPicture (previous) (diff)

#3 @DrewAPicture
7 years ago

Assuming we fix the $tag vs $tag['content'] thing, there's a still a problem with an opening <code> tag where it should be a closing </code> tag.

Here's what the raw section description looks like coming from parsedown before wporg-developer manipulates it for display:

Retrieve terms with criteria by which a term is LIKE                                                `$name__like<code>. Default empty.     @type string       $description__like

Following the fixed str_replace(), this is what we get:

Retrieve terms with criteria by which a term is LIKE                                                `$name</strong>like<code>. Default empty.     @type string       $description__like

Note the presence of the backtick before $name and <code> after like_

#4 @DrewAPicture
7 years ago

After a bit more digging, I'm thinking this might have to do with the <code> tags exception in format_param_description() added in [2206]. Removing 'code' from the array fixes the problem for display in my minimal testing.

#5 @DrewAPicture
7 years ago

Disregard what I said about $tag needing to be $tag['content']. Forgot that the subject in str_replace() can also be an array. Either way, in the scope of DevHub, the current problem with display has to do with the open <code> tag as described above.

Last edited 7 years ago by DrewAPicture (previous) (diff)

#6 @pbiron
7 years ago

Sorry I wasn't more clear in the description of this ticket.

The bug in parsedown only surfaces when all of the following are true:

  1. the param name contains __ (which parsedown treats as the start of <strong>)
  2. the param name also occurs in the param's description (which causes parsedown to treat the occurrence of __ in the param name as </strong>)
  3. the param name is wrapped in backticks, `, when it occurs in the param's description (because parsedown hasn't yet closed </strong>, it doesn't encode the opening backtick but does encode the closing backtick as the start of <code>)

Thus, the parsedown bug results in "unbalanced" tags. It's analogous to the the "Save as HTML/XML" bugs in old versions of MS Word, which often produced things like:

<p>this <b>is a <em>test</b> of the emergency</em> broadcast system</p>

when the Word document contained overlapping formatting.

I don't think it's worth hacking the DevHub theme to get around the parsedown bug, because trying to "rebalance" overlapping tags is hard and never foolproof (plus the fix to parsedown is so easy).

In case you didn't check out the PR I made against parsedown, you can temporarily fix your local copy by changing:

protected $StrongRegex = array(
    '*' => '/^[*]{2}((?:\\\\\*|[^*]|[*][^*]*[*])+?)[*]{2}(?![*])/s',
    '_' => '/^__((?:\\\\_|[^_]|_[^_]*_)+?)__(?!_)/us',


protected $StrongRegex = array(
    '*' => '/^[*]{2}((?:\\\\\*|[^*]|[*][^*]*[*])+?)[*]{2}(?![*])/s',
    '_' => '/^__((?:\\\\_|[^_]|_[^_]*_)+?)__(?!_)\b/us',

and then reslurping the source for class-wp-term-query.php and you'll see that there is no need to even touch the DevHub theme.

This ticket was mentioned in Slack in #docs by pbiron. View the logs.

7 years ago

This ticket was mentioned in Slack in #core-docs by zzap. View the logs.

7 years ago

#9 @SergeyBiryukov
7 years ago

#3221 was marked as a duplicate.

This ticket was mentioned in Slack in #docs by coffee2code. View the logs.

6 years ago

#11 @coffee2code
6 years ago

  • Owner set to coffee2code
  • Resolution set to fixed
  • Status changed from new to closed

In 6448: Fix an upstream Parsedown bug that introduces unbalanced 'code' tags.

The bug still exists upstream and this "fix" merely seeks to undo the damage. Can be removed if/when Parsedown is fixed (but shouldn't pose a problem if it isn't).

Props pbiron.
Fixes #2900.

#12 @coffee2code
6 years ago

@pbiron: Thanks for your work on this, identifying the combination of criteria that triggers the bug and with submitting a PR upstream!

The commit above "fixes" the errant markup produced by the Parsedown bug. While the bug still exists upstream, I marked this ticket as fixed since there's not much else we can do on the DevHub side of things.

If you want to keep the issue alive, a ticket on the phpdoc-parser GitHub repo would seem more appropriate at this stage, to document that the bug exists, that it's a result of a bug in a dependency, and link to the PR that fixes it. With that as a reminder, someone can periodically check upstream to see if it ever gets fixed.


This ticket was mentioned in Slack in #meta-devhub by pbiron. View the logs.

6 years ago

#14 @ocean90
5 years ago

@coffee2code Can the same be applied to changelog entries? See for an example.

#15 @coffee2code
5 years ago

In 8466:

Developer: Fix parser's conversion of double underscores to strong text in parameter descriptions.

The parser interprets __ as either as an opening or closing <strong>, which is never intended. This was already addressed for params via get_params(), but more broadly to affect other aspects of the param not just description. However, format_param_description() is used in other similar contexts, so this fix is needed there.

See #2900.

Note: See TracTickets for help on using tickets.