Annual Review Reference Card

Paid community annual review — six-layer checklist, cohort retention interpretation, activation thresholds, channel engagement multiplier, moderation benchmarks, and decision output format

Q: What is a healthy cohort retention rate for a paid community?

There is no single threshold number because the right benchmark depends on community size, price point, and topic domain. What the annual review should look for instead is the pattern and direction of retention across cohorts, not a single retention number. The four diagnostic patterns are: (1) consistent drop at months 2–3 across all cohorts — this is an onboarding problem; members across every join period are hitting the same friction point in the same tenure window, which localizes the fix to the first 90-day experience; (2) steep drop at months 5–8 in specific cohorts only, while other cohorts are flat through that tenure window — this is a calendar-period programming problem; a specific cohort joined during a period when the programming at months 5–8 was weaker, and identifying which cohorts shows you the date range to investigate; (3) flat through month 6 then decline at months 8–12 for newer cohorts — this is a long-tenure value problem; members engage through the onboarding window and the early-value phase, but the community does not have programming or content that justifies continued membership at the 8–12 month mark; (4) newer cohorts retaining better than older cohorts at the same tenure window — this is a positive signal; identify the change that correlates with the improvement and determine whether it is durable. The benchmark question to ask is not ‘is our retention above 80%?’ but ‘which cohorts are dropping and at which tenure window, and what changed before or at their join month?’ See Table 2 in this reference card for the full four-pattern interpretation table with signals and decisions for each.

This page is a structured reference card for paid community operators running their annual review. It covers: a six-layer review checklist table (what to measure, where to get the data, healthy threshold, worrying threshold, and what decision each layer informs); a cohort retention rate interpretation table (four diagnostic patterns with what each signals and what to do); an activation rate threshold table by community size bracket with seasonal adjustment flags; a channel engagement multiplier table (thresholds, when to reinvest vs. restructure, and the “why below 1 does not always mean cut” note for reference channels); a moderation load benchmark table (per-100-active-member incident rates, rising-trend signal, new-member vs. long-tenure breakdown); and the annual review decision output format (a three-section template with max-3 items per section, owner, and timeline). For the strategic reasoning — including why reviewing economics before retention produces uninterpretable conclusions, how the six-layer causal chain works, and what the most common annual review failure mode is — see the companion post: Paid community annual review: the six-layer framework. This card is for the operator who understands the reasoning and needs the checklists, thresholds, and decision template in quick-reference form.

TL; DR

Run the review in this sequence: cohort retention rate → activation rate trend → channel engagement by channel → moderation load → NPS trend → economics. Do not start with economics. The six-layer sequence exists because each layer is caused by the layers before it; economics are the output of the first five layers, not a diagnostic tool. Use the tables below in order. Each table produces a specific input to the decision output format in Table 6: what to fix (max 3), what to stop (max 3), what to scale (max 3). Each decision item must have an owner and a timeline. A review that produces no decision items with owners and timelines is a reporting exercise, not a review.

Table 1 — Six-layer annual review checklist

The six review layers in the correct sequence. The sequence is causal: cohort retention rate determines how many members stay long enough to activate; activation rate determines who engages with channel content; channel engagement determines whether members build the peer connections that drive NPS; NPS and moderation load together determine the member experience economics are responding to. Running any of these layers before the ones that cause it produces conclusions about the symptom with no path to the cause. The “decision it informs” column identifies which strategic decision each layer is the direct input to.

Layer	What to measure	Where to get the data	Healthy threshold	Worrying threshold	Decision it informs
1. Cohort retention rate	For each join-month cohort: percentage of original members still active (not cancelled) at months 1, 3, 6, 9, and 12. Build a table with join-month as rows and months-since-join as columns.	Billing system (Stripe, Memberful, Gumroad, etc.) for membership status by join month. Export active vs. cancelled by join date. Activity threshold: at least one message posted in the 30-day window for the relevant month-since-join column.	Retention pattern is consistent across cohorts at each tenure window with no single cohort dropping significantly faster than peers at the same tenure. Specific number benchmarks vary by price point; look for pattern consistency, not absolute numbers.	Consistent drop at months 2–3 across all cohorts (onboarding problem). Steep drop at months 5–8 in specific cohorts but not others (calendar-period programming problem). Decline beginning at months 8–12 for all cohorts after a stable months 1–6 (long-tenure value problem).	Onboarding investment, programming calendar, long-tenure programming. Also: whether to diagnose economics as an acquisition problem or a retention problem (requires Layer 6 data but Layer 1 answer first).
2. Activation rate trend	Monthly: (members who posted at least once within 30 days of joining) ÷ (total new members that month). Track on a consistent threshold — once set, do not change the definition. Plot 12 monthly points.	Community platform message history. Count distinct member IDs with at least one post in the 30-day window following their join date, grouped by join month. Export from Slack, Circle, Discord, or the platform’s data export.	Trend line is flat or improving month-over-month. Absolute rate depends on size bracket (see Table 3). The key signal is direction, not a single month’s number.	Declining trend over 3+ consecutive months. Any month more than 15 percentage points below the prior 3-month average without a seasonal explanation (see Table 3 for seasonal adjustment flags). December and August dips of up to 15 pp are expected and do not signal an onboarding problem.	Onboarding sequence investment (Day 0, Day 3, Day 7 touches), welcome message quality, channel taxonomy, intro prompt. Activation rate trend is the leading indicator for 6-month retention: members who do not activate in month 1 cancel at 2–3× the rate of members who do.
3. Content engagement by channel	For each channel: engagement multiplier = (member replies per operator post per month). Calculate for each active channel over the full 12-month period. Identify channels consistently above 3.0, between 1.0 and 3.0, and consistently below 1.0.	Community platform message history. Count operator posts (posts from the operator account) and member replies (replies to those posts) per channel per month. Reference channels (e.g. #resources, #jobs) where the value is in the content not the discussion may have legitimately low engagement multipliers — flag these separately before applying the threshold.	Engagement multiplier >3.0 for discussion-generative channels (the channel is producing more member dialogue per operator post than the operator is contributing). Multiplier of 1.0–3.0 for most channels is neutral — review the content type before acting.	Engagement multiplier <1.0 consistently over 3+ months for a discussion-generative channel (the operator is generating more posts than members are replying to, meaning the channel is not producing member-to-member dialogue). Exception: reference channels where members consume but do not reply (e.g. #resources, #tools) may have legitimate multipliers below 1.0 — measure download, pin, or save rates instead for those channels.	Channel restructuring (rename, archive, or merge low-engagement channels), content format changes, and the decision of where to invest operator posting time in the coming year. Channels consistently above 3.0 are candidates for format duplication in adjacent topics.
4. Moderation load	Incidents per 100 active members per month. Track separately for new members (first 90 days) and long-tenure members (90+ days). Plot monthly over 12 months to identify trend direction.	Moderation log (internal spreadsheet, Notion page, or platform flag log). If no moderation log exists, start one now and use last year’s memory as the baseline estimate. The trend line is more important than a single data point — even imprecise historical data enables directional analysis.	Below 0.5 incidents per 100 active members per month. Trend line flat or declining. New-member incident rate higher than long-tenure rate is normal (new members have not yet internalized community norms).	Above 0.5 incidents per 100 active per month and rising over 3+ months. Long-tenure incident rate rising faster than new-member rate (signals norm drift at scale, not just onboarding failure). Any month above 1.0 incidents per 100 active requires immediate investigation regardless of trend.	Rule update, onboarding norm-setting investment, channel structure changes, and the decision of whether to hire or appoint a community moderator. Rising long-tenure incident rate is the specific signal that warrants a rules review before any onboarding change.
5. NPS trend	Quarterly Net Promoter Score from a 3-question survey (NPS question + one reason-for-score open text + one top-priority improvement open text). Use 4 quarterly data points as the trend; do not use a single annual survey.	Survey tool (Typeform, Google Forms, or in-platform poll). Send to all active members. Keep it to 3 questions maximum — surveys longer than 3 questions in paid communities drop response rate below 20%, which makes the sample unrepresentative of the membership at large.	NPS trend flat or improving quarter-over-quarter. Absolute NPS benchmarks vary widely; directional trend is the signal. Promoter comments mentioning specific channels, formats, or people identify what to protect and scale. Detractor comments mentioning the same specific issue across multiple respondents identify what to fix.	NPS declining for 2+ consecutive quarters while operational metrics (retention, activation) are flat or improving — this gap signals that members are experiencing a quality shift not yet visible in behavioural data (often a community culture shift, a prominent member departure, or a format that once felt premium now feels routine). NPS leading behavioural metrics by 1–2 quarters is well-documented in membership products.	Programming priorities, community culture decisions, and the decision of whether member-facing improvements to the existing community or new-member acquisition is the higher-leverage investment. NPS trend is also the direct input to the “what to scale” section of the decision output in Table 6.
6. Economics	LTV by cohort (total revenue from cohort ÷ cohort size, tracked to the end of the review year). CAC trend quarterly (total acquisition spend ÷ new members that quarter). MRR vs. target for each month, with deviation months flagged.	Billing system for revenue by join-month cohort. Ad spend, tool, and content costs by quarter for CAC. MRR from billing system monthly. LTV by cohort requires the same join-month grouping used in Layer 1 — run both from the same billing export for consistency.	LTV trending upward in newer cohorts vs. older cohorts at the same tenure window (improving retention is showing up in LTV). CAC trend flat or declining. MRR on or above target for 8+ of 12 months.	LTV declining in newer cohorts at the same tenure window compared to older cohorts (retention problem showing up in revenue). CAC trend rising faster than LTV trend (deteriorating unit economics). MRR below target for 3+ consecutive months without a deliberate pricing or volume decision that explains the gap.	Pricing changes, acquisition investment decisions, and the kill-or-continue decision if economics are deteriorating despite flat operational metrics. Economics that are deteriorating alongside deteriorating Layer 1–3 metrics confirm the diagnosis; economics that are deteriorating despite healthy Layers 1–3 point to a pricing or positioning problem rather than a community quality problem.

The most common annual review failure is running Layer 6 (economics) before Layers 1–3. When MRR is below target, the instinctive response is to look at acquisition — but MRR is downstream of retention, and retention is downstream of activation, and activation is downstream of the first-week experience. An operator who fixes acquisition without first understanding why Layer 1 cohort retention is declining will refill a leaky bucket. Run the tables in sequence. Economics are the readout of the prior five layers, not the starting point for the diagnosis.

Table 2 — Cohort retention rate interpretation

Four diagnostic patterns to look for in the cohort retention table built in Layer 1. Each pattern has a specific signal, a most likely cause, the data that confirms it, and the decision it points to. One community can exhibit multiple patterns in the same review year — read all four rows before drawing conclusions about which pattern dominates.

Pattern	What it looks like in the table	Most likely cause	Data that confirms it	Decision it points to
Consistent drop at months 2–3 across all cohorts	Every join-month row shows a drop of 10%+ between the month-1 column and the month-3 column, regardless of which calendar period the cohort joined. The drop appears for January cohorts, April cohorts, and October cohorts equally.	Onboarding problem. Members across all cohorts are hitting the same friction point at the same tenure window. Because the pattern appears regardless of calendar period, the cause is structural — it exists in the onboarding sequence or welcome experience that all members go through, not in a programming event that only happened in certain months.	Cross Layer 1 data with Layer 2 (activation rate): if activation rate is below the healthy benchmark for the community’s size bracket (Table 3), and the month-2 drop in Layer 1 is consistent across cohorts, the two together confirm an onboarding problem. Additional confirmation: if the drop is concentrated in members who did not post in week one, the activation failure is occurring before the retention decision.	Invest in the Day 3 and Day 7 onboarding touches. Fix the first-week experience before any other programming change. See paid community metrics dashboard for the measurement setup that surfaces this pattern in real time rather than at the annual review.
Steep drop at months 5–8 in specific cohorts, flat in others	The month-5 to month-8 drop is large (10%+) for cohorts who joined in certain months (e.g. January and February cohorts) but not for cohorts who joined in other months (e.g. June and July cohorts). The same tenure window shows very different outcomes depending on when the member joined.	Calendar-period programming problem. The cohorts experiencing the months 5–8 drop are the ones who hit a specific calendar window — a low-activity summer period, a holiday season without programming, or a period when a flagship event or content series was not running — at the 5–8 month tenure mark. Members with less tenure (months 1–3) power through low-activity periods because they are still in the discovery phase; members at months 5–8 who have seen the full programming cycle and hit an empty period conclude the community has peaked.	Cross Layer 1 data with the programming calendar: identify which calendar months correspond to the months 5–8 window for the affected cohorts. If the programming calendar shows a reduced cadence or a missing series in that period, the calendar correlation confirms the cause. Cross with Layer 3 (channel engagement): if the months corresponding to the affected tenure window also show engagement multiplier drops across channels, the programming gap is visible in both the cohort data and the channel data.	Redesign the programming calendar to fill the specific gap period. If January and February cohorts are dropping at months 5–8 (which places them in June–September), the fix is a summer programming investment — a series, a live event, or a content format that specifically runs June–September. See paid community audit checklist for the programming calendar review that identifies these gaps outside of the annual review cycle.
Flat through month 6, then decline at months 8–12	Cohorts retain well through months 1–6 (the drop between month 1 and month 6 is under 10%), but show a steeper drop from month 6 to month 12. This pattern is consistent across all or most cohorts — it is not specific to certain join months. Members engage through the first six months and then cancel in the second half of their membership year.	Long-tenure value problem. The community has a strong onboarding experience and sufficient programming to carry members through the first half of the year, but does not have programming, content, or peer-connection depth that justifies continued membership at the 8–12 month mark. At months 8–12, members have seen the full programming cycle once and are evaluating whether the coming year justifies renewal. If the answer from their perspective is “I’ve gotten what I came for,” they cancel before renewal.	Cross Layer 1 data with Layer 5 (NPS trend): if NPS is declining in Q3 and Q4 of the review year (when the affected cohorts are at months 8–12), and detractor comments reference “not much new content” or “covered what I needed,” the NPS qualitative data confirms the long-tenure value gap. Additional confirmation: if month-12 retention is higher for cohorts who participated in a specific event or series in months 8–12, the event is providing the long-tenure value that other members are not getting.	Design a months 8–12 programming tier: an annual flagship event, a capstone project series, an alumni network layer, or a mentorship pairing that gives long-tenure members a reason to stay that is distinct from the reasons new members join. The fix is not to add more of what worked in months 1–6 — those members have already consumed that format. The fix is to give them a next-level experience that only makes sense after 6+ months of membership.
Newer cohorts retaining better than older cohorts at the same tenure window	January 2025 cohort retains at a higher rate at month 6 than January 2024 cohort retained at month 6. The improvement is consistent across multiple cohort comparisons — not a single month outlier. Newer cohorts are systematically retaining better than older cohorts at equivalent tenure windows.	Positive signal: an operational improvement made in the current year is producing measurably better retention than the prior year’s equivalent cohorts. This is the best-case pattern in the annual review — it shows that something you changed is working and that the change has a measurable impact on the metric that matters most.	Cross Layer 1 data with the change log: identify what changed between the prior year’s cohorts and the current year’s cohorts. Changes to look for: new onboarding sequence (Day 0, Day 3, Day 7 touches added or revised), new channel structure, new monthly programming series, pricing change, intake process change. If one change correlates strongly with the retention improvement across cohorts, that change is the one to protect and build on.	Identify the specific change that correlates with the improvement and confirm it is durable. Scale the change: if a new onboarding DM sequence drove the improvement, systematize it and apply it to every new member. Use the finding as the “what to scale” item in Table 6 (decision output format). See paid community survey templates for the quarterly survey that surfaces early signals of this kind of positive trend before the annual review.

Table 3 — Activation rate thresholds by community size

Activation rate benchmarks vary by community size because smaller communities have structural activation advantages: the operator can personally DM every non-poster by Day 3 in a 150-member community, but cannot maintain that capacity in a 600-member community without automation or a dedicated community manager. The size brackets below reflect this structural reality. Seasonal adjustment flags are included because December and August are the two months where a legitimate activation dip should not be diagnosed as an onboarding problem without checking community-wide activity in the same period.

Community size	Healthy	Worrying	Critical	Seasonal adjustment flags	What the size bracket changes
100–200 members	55–70%	40–55%	Below 40%	December and August: benchmarks shift down 10–15 pp. A 45% activation rate in December for a 150-member community is equivalent to 55–60% in other months. Check whether community-wide message volume also dropped before diagnosing as an onboarding problem.	Operators in this bracket can personally DM every non-poster by Day 3 without automation. The structural activation advantage of personal-scale personalisation produces higher activation rates than at any larger bracket. If activation rate is in the worrying zone in this bracket, the problem is almost certainly in the Day 0 or Day 3 message quality, not in capacity.
200–500 members	45–60%	35–45%	Below 35%	December and August: benchmarks shift down 10–15 pp. A 35% activation rate in August for a 350-member community is within expected seasonal range if community-wide message volume also declined. Compare against the prior-year August activation rate for the same community for the most accurate seasonal baseline.	At this size bracket, personal DMs to every non-poster require 15–45 minutes per day in active growth periods. Most operators begin automating the Day 3 non-poster nudge at this bracket. Activation rate in the worrying zone at this bracket typically reflects a transition gap — the operator has grown beyond personal-scale capacity but has not yet automated the Day 3 recovery touch.
500–1,000 members	35–50%	25–35%	Below 25%	December and August: benchmarks shift down 10–12 pp. Communities in this bracket often have enough active members that community-wide message volume holds better than in smaller communities during holiday periods — the seasonal dip may be less pronounced than at smaller brackets. Use prior-year data for the same community rather than generic seasonal estimates.	At this size bracket, automated Day 0 and Day 3 touches are table stakes. The activation rate floor is higher than at 1,000+ members because channels still have enough personal-recognition density that new members can identify community personalities and build peer connections in the first week. The primary risk at this bracket is channel proliferation: too many channels with thin activity produces the lurking default because new members cannot identify where the community actually lives.
1,000+ members	25–40%	18–25%	Below 18%	December and August: benchmarks shift down 8–12 pp. At 1,000+ members, the community has sufficient mass that some channels remain active even during low-activity periods — the seasonal dip is typically less severe in absolute terms than at smaller brackets, but may still be significant as a percentage of expected activation rate.	At 1,000+ members, operator-personal activation leverage is effectively zero for new members who do not already know the operator. Activation at this bracket is almost entirely driven by peer connections formed in the first week: whether a new member receives replies to their intro post from existing members, not from the operator. Programming that creates “welcome ambassador” roles for long-tenure members to reply to new intros is the highest-leverage activation investment at this bracket.

The single most important rule for using Table 3 is threshold consistency. If your activation threshold is “posted at least one message within 30 days of joining,” it must be exactly that definition for every month in the trend. Operators who informally raise their standard mid-year (“I’m now thinking activation means they commented in a discussion, not just posted an intro”) produce a trend line that appears to show deterioration when it actually shows a definition change. Set the threshold, write it down, and do not change it without resetting the baseline.

Table 4 — Channel engagement multiplier

The engagement multiplier is calculated per channel as: member replies per operator post per month. A multiplier above 3.0 means that for every post the operator makes in that channel, members generate more than 3 replies on average — the channel is producing member-to-member dialogue beyond what the operator is inputting. A multiplier below 1.0 means the operator is generating more posts than members are replying to. The interpretation depends on the channel’s intended function: discussion-generative channels are expected to produce multipliers above 1.0; reference channels are not.

Multiplier range	Signal	For discussion-generative channels	For reference channels	Action
>3.0	Healthy — reinvest	The channel is producing more member-to-member dialogue than operator input. This is the target state for discussion channels: the operator seeds the discussion and the community sustains it. Channels in this zone are candidates for topic duplication (create an adjacent channel on a related topic using the same content format).	Reference channels (e.g. #resources, #tools, #jobs) where the value is in the content not the replies will rarely reach 3.0 on the reply-per-operator-post metric. If a reference channel is above 3.0, it has become a discussion channel in practice — this is not a problem, but note that the channel’s function has evolved and manage it accordingly.	Protect the format and content type producing this multiplier. Use the top-performing posts in this channel as the template for content in lower-performing channels. If operator time is constrained, allocate it first to channels in this zone.
1.0–3.0	Neutral — review content type	The channel is producing member replies, but not at a rate that suggests members are driving the discussion independent of operator input. This is the most common zone for channels that are performing adequately but have not yet found the content format that produces member-to-member dialogue. Review the format of the posts generating the highest reply counts in this zone — those formats are candidates for higher frequency.	Reference channels in the 1.0–3.0 range are performing well: members are engaging with the content the operator posts, which means the content is relevant and discoverable. No action needed on a reference channel in this range unless the saves, pins, or downloads metric is also low.	Identify the 3 posts in this channel that produced the highest reply count in the past 12 months. What format are they? (Question prompt, specific resource share, case study request, hot take?) Increase the frequency of that format. If no clear format emerges from the top-3 analysis, the channel may need a stronger topic constraint.
<1.0 consistently (3+ months)	At risk — restructure candidate	For discussion-generative channels: the operator is generating more posts than members are replying to. Over three months, this pattern indicates that members are not finding the channel content relevant enough to respond to. This is the primary restructure or archive signal. Before archiving, run one format experiment: a question-only month where every operator post in the channel is a direct question to members. If the multiplier does not improve in the question-only month, the channel topic does not produce member dialogue and the channel should be archived or merged.	For reference channels: a multiplier below 1.0 is expected and is not a restructure signal on its own. Reference channels are consumed, not discussed. The right metric for reference channels is saves, pins, or link clicks per post, not the reply-per-post multiplier. If saves are also low on a reference channel, the content is not relevant to the current membership — that is the restructure signal, not the reply count.	Discussion channels: run a one-month question-only experiment, then decide restructure vs. archive. Reference channels: check saves/pins/clicks. If all three are low, archive. If only reply count is low but saves/pins are healthy, the channel is functioning correctly as a reference resource and no action is needed.

Table 5 — Moderation load benchmarks

Moderation load is measured as incidents per 100 active members per month. “Active members” is defined as members who posted at least once in the month; inactive members do not generate moderation incidents at meaningful rates. “Incident” is defined as any event requiring operator or moderator intervention: a guideline violation, a member conflict, a spam post, or a member complaint. The new-member vs. long-tenure breakdown is important because the two populations have different causes for incidents and require different responses.

Benchmark zone	Incidents per 100 active/mo	New-member (first 90 days) pattern	Long-tenure (90+ days) pattern	Primary signal and action
Healthy	<0.5	New-member incidents at or below the community average for this benchmark zone is normal — members who join a well-run community with clear rules and a strong onboarding sequence rarely violate norms in the first 90 days. A new-member incident rate significantly above the community average (e.g. 2×+ the long-tenure rate) suggests the onboarding sequence is not setting norm expectations effectively.	Long-tenure member incidents below 0.5 per 100 active per month indicates a stable community culture where norms are internalized and members self-moderate peer behavior. Incidents in this zone are typically edge cases (a long-tenure member having a bad week, an isolated conflict) rather than systemic patterns.	No structural action needed. Maintain the moderation log and continue monitoring. If trend line is flat at this level across all 12 months, note it as a positive signal in the decision output (Table 6) and identify what practices are producing the stability.
Watch	0.5–1.0	New-member incidents in the watch zone may indicate an onboarding norm-setting gap: members are not understanding the community’s behavioral expectations from the Day 0 and Day 7 sequence. Review the rules page and the Day 0 DM for the specificity of the norm expectations. Generic rules (“be respectful”) produce watch-zone incident rates for new members because they cannot be self-applied; specific rules (“do not promote your own work without first making three contributions to others in the prior 30 days”) can be self-applied and produce fewer incidents.	Long-tenure member incidents in the watch zone, combined with a rising trend, is the most important signal in the moderation table: it indicates community norm drift at scale. As communities grow, the proportion of long-tenure members who were onboarded under the original culture decreases, and new behavioral patterns emerge from the larger member pool. Rising long-tenure incidents require a rules review and a culture-reset post, not just increased enforcement.	Identify whether the watch-zone rate is driven primarily by new members or long-tenure members. New-member-driven: fix the onboarding norm-setting sequence. Long-tenure-driven: conduct a rules review and identify whether the drift is from specific new members who are modeling different norms or from a community-wide culture shift.
Intervention required	>1.0	New-member incident rate above 1.0 per 100 active per month indicates a systemic onboarding failure: the community is not communicating its norms to new members at all, or the norms are not being internalized by a significant proportion of new members. At this rate, the operator is spending meaningful time on moderation work that should be handled by the onboarding sequence. Intervention: rewrite the Day 0 DM to include explicit norm statements, rewrite the community rules page to use specific behavioral language, and add a Day 7 message that references the rules page.	Long-tenure member incident rate above 1.0 per 100 active per month is a community health emergency. This rate indicates that experienced members who understand the community’s history are nevertheless generating incidents at a rate that requires regular operator intervention — which means either the rules are not being enforced consistently, the community has scaled beyond the operator’s moderation capacity, or a subset of long-tenure members are modeling behavior that normalizes guideline violations. Intervention: audit the most recent 10 incidents for the long-tenure group; identify whether there is a common triggering context, a common set of members involved, or a common type of violation.	Immediate intervention required regardless of whether the high rate is driven by new members or long-tenure members. For new-member-driven rates: rewrite onboarding norm-setting content. For long-tenure-driven rates: conduct a moderation audit of recent incidents, identify the pattern, and address it directly with the members involved and with a community-wide norm restatement. See Foothold community health check for a real-time moderation load tracking setup.

The trend direction matters more than the absolute rate for any single month. A community at 0.4 incidents per 100 active per month but trending up for three consecutive months deserves more attention in the annual review than a community at 0.7 that has been flat for 12 months. The flat 0.7 is a stable (if elevated) equilibrium; the rising 0.4 is a community in transition whose moderation load will be in the intervention zone within a quarter if the trend continues. Build the 12-month trend line, not just the end-of-year number.

Table 6 — Annual review decision output format

The decision output is the only deliverable from the annual review that actually matters. A review that produces observations without decisions has not been completed. The three-section format limits decisions to three items per section so that the review produces an actionable list, not a prioritization problem. Every item must have an owner (who is responsible for the change) and a timeline (when it will be done). The most common failure mode is completeness: a review that produces 15 observations and 8 decisions with no owners produces exactly zero changes in the following year, because no specific person was accountable for any specific item by any specific date.

Section	Max items	Required for each item	What counts as a valid item	Common failure
What to fix	3	Specific change (not a goal). Owner (person’s name, not “the team”). Timeline (specific date, not “Q1”). Which review layer diagnosed it (Layers 1–6 from Table 1).	“Rewrite the Day 3 non-poster DM to include a specific channel recommendation based on the member’s stated goal. Owner: [name]. Done by: February 15. Diagnosed by: Layer 2 activation rate trend declining in H2.” The item names the specific change, the owner, the deadline, and the data that identified it. Not valid: “Improve onboarding” with no specific action, no owner, and no deadline.	Listing more than 3 fix items. Operators who identify 7 things to fix and list all 7 produce a fix list that does not get fixed. Priority forces commitment. If you have identified more than 3 things to fix from the six-layer review, rank them by impact on Layer 1 cohort retention (the root metric) and take the top 3. The other 4 items go in a parking lot to be reviewed at the mid-year check-in.
What to stop	3	Specific thing being stopped (not a category). Rationale (which review data supports stopping it). Owner of the stop action. Date by which it will be stopped or wound down.	“Archive the #off-topic channel. Engagement multiplier has been below 0.5 for 8 consecutive months (Layer 3 data). Owner: [name]. Archived by: January 31. Members will be DM’d a week before archive explaining the consolidation.” Not valid: “Stop doing things that don’t work” — a category stop with no specific action produces no actual stopping.	Treating “what to stop” as optional. Most annual reviews produce long “what to start” and “what to fix” lists and nothing in the “what to stop” section. This is a resource allocation error: every activity the community continues consumes operator attention that cannot be redirected to what to fix or what to scale. A review that does not stop anything has not freed up the capacity to do the fixes.
What to scale	3	Specific thing being scaled (not a category). Evidence that scaling it will improve a Layer 1–6 metric (not just that members said they liked it). Owner. Scaled-by date. What “scaled” means specifically (2× frequency? Expanded to a second channel? New member segment included?).	“Double the frequency of the ‘ask me anything’ async thread prompt from bi-weekly to weekly in Q1. Rationale: this format produced the top-3 engagement multiplier scores in the channel review (Layer 3) and is mentioned positively in 40% of Q4 NPS detractor comments as ‘the kind of thing I want more of.’ Owner: [name]. Weekly cadence starts: January 8.” Not valid: “Do more of what members enjoy” — non-specific, no metric connection, no owner.	Scaling things that members say they like but that have no measurable connection to Layer 1–3 metrics. Member preference is a signal; metric connection is the evidence. If you cannot draw a line from the thing you want to scale to cohort retention (Layer 1), activation rate (Layer 2), or channel engagement (Layer 3), you are scaling for member satisfaction without evidence that it improves community health. Scale what the data says improves retention and activation, informed by what members say they value.

The decision output format is also a forcing function for the review process. If you cannot produce three specific fix items, three specific stop items, and three specific scale items with owners and timelines, the review data has not been examined at sufficient depth. “The review did not surface any clear fixes” is almost never true in a community that has operated for 12 months — it means the six-layer analysis was run at too high a level of abstraction (looking at totals, not cohort-level patterns) to produce actionable findings. Run the tables in Table 1 at the layer of detail described in each row, and the decision output will follow.

Frequently asked questions

What should a paid community annual review include?

A complete paid community annual review covers six layers in a specific sequence: cohort retention rate, activation rate trend, content engagement by channel, moderation load, member NPS trend, and economics. The sequence is causal — each layer is produced by the layers before it. Cohort retention rate is the starting point because it determines how many members stay long enough to activate; economics are the final layer because they are the output of all five preceding layers. Reviewing economics before understanding retention and activation produces conclusions that cannot be acted on because the economic result is downstream of the operational decisions that shaped it. Each layer is covered in Table 1 of this reference card with the specific measurement method, data source, healthy threshold, worrying threshold, and the decision each layer informs. The review produces a decision output in three sections: what to fix, what to stop, and what to scale, each with a maximum of three items per section, an owner, and a timeline.

What is a healthy cohort retention rate for a paid community?

There is no single threshold number for healthy cohort retention because the right benchmark depends on community size, price point, and topic domain. What the annual review should look for is the pattern and direction of retention across cohorts, not a single retention number. The four diagnostic patterns are: consistent drop at months 2–3 across all cohorts (onboarding problem); steep drop at months 5–8 in specific cohorts only, while other cohorts are flat through that tenure window (calendar-period programming problem); flat through month 6 then decline at months 8–12 for all or most cohorts (long-tenure value problem); newer cohorts retaining better than older cohorts at the same tenure window (positive signal indicating an operational improvement that should be identified and scaled). Table 2 in this reference card covers all four patterns with what each looks like in the join-month table, the most likely cause, the data that confirms the cause, and the decision it points to.

What is a healthy activation rate for a paid community?

Activation rate benchmarks differ by community size because smaller communities have structural activation advantages that larger communities lose at scale. Typical healthy activation rates by community size: 100–200 members: 55–70%; 200–500 members: 45–60%; 500–1,000 members: 35–50%; 1,000+ members: 25–40%. Two important caveats apply. First, the activation threshold (typically “posted at least once within 30 days of joining”) must be consistent across all months in the trend — changing the definition mid-analysis makes the trend uninterpretable. Second, December and August typically produce activation rates 10–15 percentage points below the annual average because new members who join in those months join during periods of naturally lower community activity. Do not diagnose a December or August dip as an onboarding problem without checking whether overall community message volume also dropped in the same period. Table 3 in this reference card covers all four size brackets with worrying and critical thresholds and the structural explanation for why the benchmark shifts by size.

When should a paid community do its annual review?

January covering the prior calendar year is the standard timing for two reasons: it aligns with the natural decision cycle for most operators (January is when pricing, programming, and infrastructure decisions are made for the year ahead), and it produces exactly 12 months of data with a clean start and end point. The 12-month minimum data requirement is the binding constraint: communities that launched mid-year will not have 12 full months of data to review in their first January. Mid-year launchers should use an anniversary-based review cycle — for example, if the community launched in August 2024, the first annual review runs in August 2025 covering August 2024–July 2025. The most important requirement is that the review is timed to a decision cycle, not to a reporting cycle: running a review in January and making programming, pricing, and infrastructure decisions by February is the functional sequence; running a review in March after the year’s decisions have already been made produces analysis that cannot be acted on until the following year.

Related reference cards

Paid community annual review: the six-layer framework — the companion post with the strategic reasoning behind the review sequence, the causal chain between layers, and the most common failure modes
Paid community metrics dashboard — the real-time measurement setup that surfaces the Layer 1–4 signals in this reference card throughout the year, not only at the annual review
Paid community audit checklist — mid-year audit template for operators who want to run a lightweight version of the annual review at the 6-month mark
Paid community survey templates — the quarterly 3-question survey format for Layer 5 (NPS trend) with question wording, cadence, and response rate guidance
Foothold community health check — automated Layer 1–4 tracking for paid Slack communities: cohort retention, activation rate, engagement multiplier, and moderation load in a single dashboard