Playtest Early, Playtest Often

I am always spinning a few different plates in terms of design. When that isn’t obvious just from my blog posts, it’s because I tend to want to spare you that level of nitty gritty (or perhaps because I want to let the nit and the grit be something only my patreon patrons are privy to) and also because if I took the time to write a little editorial every time I tweaked a rule I would soon have no spare time to do said tweaking. However, I wanted to talk about a recent example where playtesting revealed an unintended result and how I responded.

Last year, I declared 2025 as the Year of the Beta, and for a long time I struggled getting my own beta rules off the ground. However, shamed by my failure to meet my own called shot, I ended up cobbling together a few different snippets of rules I was working on together and organized some playtests thereof. The reason I was previously reticent to do something like this is that I thought I needed it to be more complete before dragging it out. I.e., I envision (and have mostly designed) all these interconnected rules and testing just part of them felt wobbly like sitting on a stool with too few legs. Surely it is better to wait until the plane has all its parts before you begin to fly it. However, games are neither stools nor planes, and they benefit from big spectacular crashes. 

I recently designed a new core mechanic (influenced by one of my blog posts from last year) and was happily beavering along building a system around it. However, if I did go along with my earlier plan of designing a big-ass, baroque system around this core mechanic without ever giving it a spin, I would have the potential problem of the XKCD “Complex Structure Supported by a Tiny Part” where that tiny part was somehow defective. It is much easier to fix something early on than after you have built an elaborate structure atop it. So I decided to finally test the rules with some of the gaming sickos who hang out in my discord server. 

The basic gist of the (now twice-modified/entirely retooled, but we will get to that) core mechanic being tested is that player characters have stats ranging from 2-12, for tests the referee chooses a difficulty value ranging from 0-7, and the player rolls 1 or more d12s and tries to roll below their stat and above the difficulty value. None of this is super innovative, it is basically the core mechanic of Errant except using d12s instead of d20s. The main differences are that (1) per the above-referenced blog post, player characters have advantage by default so long as they aren’t over-encumbered, and (2) there were a range of possible results beyond the binary. If you rolled below the stat and above difficulty value, that was a success; if you rolled above the stat, that was a failure; if you rolled below the stat but also below the difficulty value, that is a partial success (I call it a stumble); if your roll matches your stat, that is a critical success (I call it a triumph); and if your roll matches the difficulty value, that is a critical failure (I call it a fumble).

Even with a very small (and statistically insignificant) sample size generated by the game, because of the advantage and the smaller die size, critical successes were happening way more often than I liked. Sure, you can determine how often they are likely to occur based on the abstract realm of math, but you don’t really get the gamefeel of it all until you literally feel it in a game. That is why you playtest. Spending hours on AnyDice is insufficient. (Also gaming is fun; it is the point of all this.) 

So I changed the core mechanic, specially changing how critical hits and critical failures happen and therefore how often they will happen. I’ll describe it all below, and by reading you may feel you have a good sense of how this all would feel in game, but you don’t. I don’t, at least not as of writing this post. But I will in a week time when I try out an updated core mechanic on my unsuspecting (no, they suspect it, I’m sure) playtesters. And maybe you will if you have a group of unwitting players who are adventurous enough not to groan in disapproval when you announce that you read a new thing online and you want to give it a shot this week instead of the usual (boring) d20+modifier versus threshold they’re all used to. If you do so, I want to know only one thing: how did it feel?

An Updated Core Mechanic

When a character attempts a task, the referee may call for them to Test an Attribute if the outcome is uncertain, failure would have meaningful consequences, and even success may come at a cost. A simple mantra: no risks, no rolls. 

Advantage and disadvantage work the way you probably expect (roll multiple dice [in this case d12s] and take the best or worst result) except multiple sources can stack. 

  1. The player describes what they want to accomplish and how they plan to do so.

  2. The referee calls for a Test if success or failure of the task would be a meaningful outcome and when success can come with a complication.

  3. The referee picks the most relevant Attribute for the task.

  4. The referee sets the Degree of Challenge (“DC”) for the Test.

  5. The player rolls 1d12 (or more than one, if made at Advantage or Disadvantage) comparing the result with the relevant Attribute and DC.

  6. The referee describes the outcome of the Test, including any setbacks, based on the degree of success.

Degrees of Success

When the roll would be a Success but any dice match, the Test is a Triumph. On a Triumph, they succeed beyond their wildest expectations, and the referee describes some additional boon.

When the roll is below the Attribute and equal to or above the DC, the Test is a Success. Alternative, if the roll would be a Stumble but any dice match, the Test is a Success. On a Success, they achieve their stated goal as intended.

When the roll is below both the Attribute and the DC, the Test is a Stumble. On a Stumble, they accomplish their goal, but the referee describes an unintended setback or cost. 

When the roll is equal to or above the Attribute, the Test is a Failure. On a Failure, they fail at their intended task. If they are a PC, they also Mark the Attribute.

When the roll would be a Failure but any dice match, the Test is a Fumble. On a Fumble, they fail at their intended task, and the referee describes an unintended setback.

Degree of Challenge (DC)

The table below can be used as a rough guideline for establishing the DC for a task. The referee should state the DC before the players roll for the Test. 

An Easy task is DC 1. Challenging tasks are DC 3, DC 5 for Arduous tasks, and DC 7 for Herculean tasks.

Edge Cases on Tests

  • When characters are working together to complete a task and only one needs to succeed, the two most well-suited characters that are helping make Tests and use the best result. 

  • Some tasks, like forging a signature, are harder with two sets of hands holding the pen. If too many cooks would spoil the broth, only the most involved character makes the Test. 

  • When a group of characters is acting in concert and all of them need to succeed together, such as when sneaking past sentries,  the most and least well-suited characters participating make Tests and use the worst result. 

  • If multiple characters are attempting the same task but only one can succeed or when one character acts simultaneously to prevent another from succeeding, they all make Tests and the one who rolls highest while still succeeding at the Test wins. If the roll is a tie, it is a draw and the status quo prevails. If no one succeeds, it is a draw, and the referee may present an unexpected setback.

  • A failed Test cannot be repeated unless and until the situation is drastically altered in some way. Even then, sometimes the moment has passed and the opportunity squandered. If you have already failed to catch the ceramic urn containing the ashes of a reversed sage, you can’t try again once it smashes on the ground beneath your feet. 

You Are Never Done (Until You Are)

I ran even more playtests in the later part of last year using the above rules, but they still were not clicking. I like the “blackjack” mechanic that I was using (stolen from Errant), but with all the other complexity that was going on, the extra nuance, or perhaps kludge, felt like too much sand in the gears. I was already riddled with doubts about the system when an offhand comment by a wonderful playtester said something about how rolling in the middle just is not as inherently satisfying as rolling high (or low). A simple truth, out of the mouth of babes.

So I changed it. One of the core parts of the rules is that the stats are spendable resources, which also caused a common question where players would ask “wait, am I rolling against the max or current value for my stat?”. I wanted to keep the degrees of success, and decided to jettison the difficulty context and lean into the stat-spending by building the answer to that FAQ into the core mechanic. I have just one session under my belt using these rules, but so far the peanut butter is tasting much smoother.

An Updated, Updated Core Mechanic

The player describes what they want to accomplish and how they plan to do so.

  1. The DM calls for a Test if success or failure of the task would be a meaningful outcome and when success can come with a complication.

  2. The DM picks the most relevant Attribute for the task.

  3. The player rolls 1d20 (or more than one, if made at Advantage or Disadvantage) comparing the result with the relevant Attribute (Maximum and Remainder).

  4. The DM describes the outcome of the Test, including any setbacks, based on the degree of success.

When the roll is a Natural 1, the Test is a Triumph. On a Triumph, they succeed beyond their wildest expectations, and the DM describes some additional boon.

When the roll is below the Attribute Maximum and Remainder, the Test is a Success. On a Success, they achieve their stated goal as intended.

When the roll is below the Attribute Maximum but equals or exceeds its Remainder, the Test is a Stumble. On a Stumble, they accomplish their goal, but the DM describes an unintended setback or cost.

When the roll equals or exceeds the Attribute Maximum, the Test is a Failure. On a Failure, they fail at their intended task. If they are a PC, they also Mark the Attribute.

When the roll is a Natural 20, the Test is a Fumble. On a Fumble, they fail at their intended task, and the DM describes a further setback.

Did I just retvrn to the beauty and simplicity of d20 Roll-Under? Maybe, but it’s all about the journey, ain’t it?


My blog posts are shared early as a reward to all of my Patreon supporters. If you want to support my blog, games, and get early access to all such endeavors, then get thee hence to my Patreon. You can also support my efforts by buying any of my games from my shop.

Next
Next

The Rankin Bass Hexmas Crawl