Opened 4 years ago
Closed 20 months ago
#5344 closed defect (bug) (fixed)
Delete stale, orphaned topic tags
Reported by: | jonoaldersonwp | Owned by: | dd32 |
---|---|---|---|
Milestone: | Priority: | low | |
Component: | Support Forums | Keywords: | seo |
Cc: |
Description (last modified by )
Topic tags which have only one related posts, where that member was created more than a year days ago, should be either consolidated into related/similar tags, or when there's no suitable candidate for consolidation, deleted (and a 410 status returned).
To facilitate the unavoidably manual process of consolidation, would it be possible to expose a list of URLs of tags which meet this criteria?
Attachments (2)
Change History (27)
This ticket was mentioned in Slack in #meta by jonoaldersonwp. View the logs.
4 years ago
This ticket was mentioned in Slack in #forums by yui. View the logs.
4 years ago
This ticket was mentioned in Slack in #meta by tellyworth. View the logs.
4 years ago
#6
@
4 years ago
5344-limit-topic-creation.patch is the first step in resolving this.
It will limit the creation of new topic tags to moderators or above (this can be further refined if needed, but seems like the natural point).
This means the tag field remains as it is while we work on shortening the tags list, but any attempt to write a tag that does not exist is just skipped.
This ticket was mentioned in Slack in #meta by clorith. View the logs.
4 years ago
#8
@
4 years ago
@Clorith It seems like returning a WP_Error there will abort inserting of all terms after the non-existent one. See https://core.trac.wordpress.org/browser/trunk/src/wp-includes/taxonomy.php#L2602 where the function returns instead of continuing the loop.
#9
@
4 years ago
Good catch @Otto42, I had of course only tested with new tags coming in last so completely missed that!
5344-limit-topic-creation.2.patch takes a more roundabout way to reach the same goal, I didn't spot any good core way of preventing tag creation, but there's some bbPress filters that could be utilized to achieve what was needed here.
The new topic creation is a bit more involved than previously, I don't think we add any secondary taxonomies, but the approach taken futureproofs it in case we need to (or actually do already), and shouldn't add any noteworthy added overhead with its checks as these are all features that are used otherwise in the process so the responses would be in memory for the request.
This ticket was mentioned in Slack in #meta by clorith. View the logs.
4 years ago
This ticket was mentioned in Slack in #meta by clorith. View the logs.
4 years ago
This ticket was mentioned in Slack in #meta by clorith. View the logs.
4 years ago
#14
follow-ups:
↓ 15
↓ 18
@
4 years ago
Just noting that I cleaned up 5344-limit-topic-creation.2.patch a little bit, and moved it to Performance Optimizations, so that this only affects the english support forums, and not all the localised support forums.
#15
in reply to:
↑ 14
@
4 years ago
Replying to dd32:
Just noting that I cleaned up 5344-limit-topic-creation.2.patch a little bit, and moved it to Performance Optimizations, so that this only affects the english support forums, and not all the localised support forums.
Nice one!
#18
in reply to:
↑ 14
;
follow-ups:
↓ 19
↓ 23
@
4 years ago
Replying to dd32:
Just noting that I cleaned up 5344-limit-topic-creation.2.patch a little bit, and moved it to Performance Optimizations, so that this only affects the english support forums, and not all the localised support forums.
Sounds reasonable for now (I seem to recall rosetta also wanting this at some point, but I think it makes sense to revisit that once the full on tag-upgrade-process is in place, since the curated tags are only really valuable once we've also implemented a way to more sanely choose tags, what I like to call phase 3 of this).
For the next phase, I'm thinking removing all "undesirables" from the tag list makes sense as a first step. Doing so will reduce the total dataset we have to work with, and make it much easier to determine the overarching tag hierarchy needed to group the remaining tags.
I'm thinking something initially manageable like removing tags that:
- Have a numeric only slug (these are purely HTML entity tags from a quick check, and even if the tag was fully numeric any way, numbers alone give no context and as such hold no value)
- Have fewer than 5 uses
- Are literally the term
WordPress
, orwp
(since it's fairly redundant to tag a topic as being about WordPress, on a WordPress support forum)
I'll lean a bit on @jonoaldersonwp to sanity check that these sound like sensible criteria for a first set of removable tags.
#19
in reply to:
↑ 18
@
4 years ago
Those sound reasonable to me as a first set to remove.
Replying to Clorith:
I'm thinking something initially manageable like removing tags that:
- Have a numeric only slug (these are purely HTML entity tags from a quick check, and even if the tag was fully numeric any way, numbers alone give no context and as such hold no value)
- Have fewer than 5 uses
Looking at the current list of tags, here's the counts:
Topics. Tags w/ that many topics. 0 36,724 1 492,877 2 53,658 3 19,359 4 10,231 5 6,288 6 4,271 7 3,233 8 2,484 9 1,945 >10 22,257
Combining removing tags with less than 5 uses, and those whose slugs are just numeric, we'd be removing 613k tags leaving 40k tags behind.
- Are literally the term
WordPress
, orwp
(since it's fairly redundant to tag a topic as being about WordPress, on a WordPress support forum)
There's quite a few which match that too, but I suspect that list is going to be harder to come up with, although, looking at the most used tags, there's a number of obvious ones.
Eg, top 20 topic-tags in use:
name count woocommerce 43,541 plugin 38,768 wordpress 37,200 error 36,563 css 29,316 theme 25,310 menu 21,195 php 18,578 image 18,525 header 18,308 images 18,149 post 16,522 posts 16,094 categories 15,393 sidebar 15,298 widget 14,654 category 14,577 Comments 13,920 login 13,784 multisite 13,680
#21
@
4 years ago
That sounds great, I think we'll hold off on removing any of the other top-20 tag uses (with the exception of wordpress
and wp
), as some of them are plugin slugs.
The reasoning is simple, some plugin authors follow their slugs to catch topics, and depending on how they do this following, they might get odd errors on their end, so I would like to announce the removal of these properly first, to ensure a good transition for those who may be using them.
#23
in reply to:
↑ 18
@
4 years ago
Replying to Clorith:
I'm thinking something initially manageable like removing tags that:
- Have a numeric only slug (these are purely HTML entity tags from a quick check, and even if the tag was fully numeric any way, numbers alone give no context and as such hold no value)
Done. All of the affected 150 terms were either numeral, or only contained non-[a-z0-9] characters like #:)
- Have fewer than 5 uses
Done.
- Are literally the term
WordPress
, orwp
Done.
I removed wordpress
, wp
, plugin
, theme
, and error
. These were just the generic tags on the first page of edit-tags.php that I didn't see offering any value.
Happy to remove any other similarly generic ones if provided a list, WordPress cannot delete these terms directly itself, as it's not optimized for such operations on large tags (It fetch/diffs/sets the tags for each topic, rather than just deleting the tag for the topic which is much faster)
Note: I took a snapshot of the terms prior to running this, I can revert any individual change or triple check what the data was before this if required for anything.
I have said list and started work on a quick script to find similarities and consolidation for the tags so we can merge / remove them.
But I wanted a proper solution for tags before doing that work fully, fix the cause first, then the symptom or it'll get out of hand again (there's currently close to 600k tags)