Keyword Mode vs Semantic Mode in Lorebooks: Which Should You Use?

Keyword Mode vs Semantic Mode in Lorebooks: Which Should You Use?

Meta Description: Keyword matching or semantic matching for lorebooks? Learn the critical differences, when to use each mode, regex patterns, vector storage setup, and the hybrid approach used by top creators. Optimize your lorebook at studio.meganova.ai

Keywords: lorebook keyword matching, semantic matching lorebook, vector storage SillyTavern, regex lorebook entries, World Info keyword mode, embedding similarity lorebook, lorebook activation methods, MegaNova Studio lorebook, exact match vs semantic, lorebook best practices 2026, AI character lorebook, deterministic lorebook matching


🎯 The Great Lorebook Debate

You've set up your lorebook perfectly:

  • ✅ Priority numbers assigned (1-10 scale)
  • ✅ Insertion order configured
  • ✅ Token budget optimized
  • ✅ Probability settings dialed in

But your entries still aren't firing correctly.

Scenario 1: User says "The ancient sword glows blue" but your "Magic Sword" entry doesn't trigger.

Scenario 2: User says "She drew her blade" and suddenly five different weapon entries fire at once.

Scenario 3: Your lorebook fires entries that are vaguely related but not actually relevant to the conversation.

What's going on?

You're using the wrong matching mode. ⚔️💀


🔍 The Two Matching Modes Explained

Lorebook systems (MegaNova Studio, SillyTavern, Janitor AI, NovelAI) use TWO fundamentally different approaches to determine when entries should fire:

Mode 1: Keyword Matching (Exact/Regex)

How it works:

User message: "The dragon breathed fire"
Lorebook entry keywords: ["dragon", "fire breath"]
Result: Entry fires ✅ (exact word match found)

What it does: Scans messages for exact keyword matches (or regex patterns).

Characteristics:

  • Deterministic - You know exactly what will trigger each entry
  • Predictable - Same input = same output every time
  • Precise - Only fires when specific words appear
  • Rigid - Won't catch synonyms or related concepts
  • Keyword-dependent - Users must use YOUR exact words

Mode 2: Semantic Matching (Vector/Embedding)

How it works:

User message: "The wyrm exhaled flames"
Lorebook entry content: "Ancient dragons breathe fire"
Result: Entry fires ✅ (semantic similarity detected)

What it does: Uses AI embeddings to find entries with similar meaning to recent messages.

Characteristics:

  • Flexible - Catches synonyms, related concepts, paraphrases
  • Contextual - Understands meaning, not just words
  • Comprehensive - Finds relevant info even with different wording
  • Unpredictable - Can't know exactly what will trigger
  • Imprecise - May fire entries that are loosely related but not relevant

⚔️ Keyword vs Semantic: Head-to-Head Comparison

Feature Keyword Mode Semantic Mode
Matching Method Exact words or regex patterns Embedding similarity (vectors)
Predictability 100% deterministic Probabilistic (varies by model)
Synonym Handling Poor (needs explicit keywords) Excellent (understands meaning)
Setup Complexity Simple (list keywords) Complex (configure embeddings)
Token Efficiency High (only fires when needed) Variable (may fire unexpectedly)
Best For Critical rules, specific triggers Atmospheric lore, flexible context
Debugging Easy (check if keyword present) Hard (why did this fire?)
Platform Support Universal Requires vector storage extension

📊 When to Use Keyword Mode

✅ Use Keyword Mode For:

Entry Type Why Keyword Mode Wins Example
Critical Rules Must fire consistently, never miss "Vampires burn in sunlight"
Character Identity Core traits must always apply "Character is immortal"
Magic Systems Specific mechanics need exact triggers "Fire magic requires mana cost"
Plot-Critical Info Can't risk missing key story elements "Character is secretly the king"
Combat Mechanics Precise triggers for abilities "Sword +1 grants +2 to hit"
Named Locations Specific places need exact matches "Winterfell", "King's Landing"
Character Names Must match exactly, no ambiguity "Jon Snow", "Daenerys"
Items/Equipment Specific items with unique properties "The One Ring", "Excalibur"

Keyword Mode Example: Magic System

═══════════════════════════════════════════════════
Entry: "Fire Magic Rules"
Matching Mode: KEYWORD
Keywords: fire magic, fire spell, pyromancy, flame spell
Insertion Order: 55
Priority: 9 (ESSENTIAL)
Content: "Fire magic requires: 
         - Direct line of sight to target
         - Mana cost: 10 points per spell
         - Cannot be cast underwater
         - Recharges at 5 mana/minute"
Why Keyword Mode: Critical game mechanic must fire 
         consistently when fire magic is mentioned
═══════════════════════════════════════════════════

User says: "I cast fire magic at the guard"
Result: Entry fires ✅ (exact keyword match)

User says: "I shoot a flame spell"
Result: Entry fires ✅ (keyword "flame spell" matches)

User says: "I throw fireball"
Result: Entry does NOT fire ❌ (no keyword match)

This is GOOD! You want control over when critical rules apply.


🌊 When to Use Semantic Mode

✅ Use Semantic Mode For:

Entry Type Why Semantic Mode Wins Example
Atmospheric Lore Flexible triggering adds immersion "The forest feels ancient and watchful"
Cultural Details Related concepts should trigger Elven customs fire when "elf culture" mentioned
Historical Events Paraphrases should activate "The great war" fires for "ancient battle"
Character Backstory Emotional context matters Trauma entry fires when user describes similar situation
World-Building Broad topics benefit from flexibility Kingdom politics fire for various government mentions
Flavor Text Occasional firing is acceptable Weather descriptions, ambient details
Relationships Context-dependent activation "Rivalry" fires for competition, conflict, tension
Factions/Organizations Related terms should trigger "Thieves Guild" fires for "criminal organization"

Semantic Mode Example: Atmospheric Lore

═══════════════════════════════════════════════════
Entry: "Haunted Forest Atmosphere"
Matching Mode: SEMANTIC (Vector Storage)
Keywords: [none - vector matching only]
Insertion Order: 30
Priority: 4 (STANDARD)
Probability: 70%
Content: "The ancient trees loom overhead, their 
         twisted branches blocking most sunlight.
         An eerie silence pervades, broken only by
         the occasional caw of a distant crow.
         The air feels heavy, as if watched by
         unseen eyes."
Why Semantic Mode: Should fire whenever conversation
         feels forest-related, even without exact keywords
═══════════════════════════════════════════════════

User says: "We walk through the creepy woods"
Result: Entry fires ✅ (semantic similarity to "haunted forest")

User says: "The ancient trees surround us"
Result: Entry fires ✅ (vector similarity detected)

User says: "This forest gives me chills"
Result: Entry fires ✅ (emotional context matches)

This is GOOD! Atmospheric entries benefit from flexible triggering.


🔧 How to Set Up Each Mode

Keyword Mode Setup (SillyTavern/MegaNova)

Step 1: Create Entry

Entry Name: "Dragon Lore"
Keywords: dragon, drake, wyrm, wyrms
Match Whole Words: ✅ Enabled
Case Sensitive: ❌ Disabled

Step 2: Configure Matching

Scan Depth: 4 (last 4 messages)
Token Budget: 25% of context
Probability: 100%

Step 3: Add Optional Filters

Secondary Keys: fire, breath, wings, scales
Logic: AND ANY (at least one secondary key)

Step 4: Test

Test message: "The dragon breathed fire"
Expected: Entry fires ✅
Test message: "The bird flew away"
Expected: Entry does NOT fire ❌

Semantic Mode Setup (Vector Storage)

Prerequisites:

  1. Vector Storage extension enabled
  2. Embedding source configured (OpenAI, local model, etc.)
  3. "Enable for World Info" checked in Vector Storage settings

Step 1: Vectorize Entries

Entry Name: "Elven Culture"
Content: "Elves value nature, longevity, and artistic 
         expression. They live for centuries and 
         remember ancient traditions..."
Status: 🔗 Vectorized (embedding generated)

Step 2: Configure Vector Storage

Query Messages: 4 (last 4 messages scanned)
Max Entries: 5 (max vector-matched entries)
Similarity Threshold: 0.75 (adjust for precision)
Enabled for All Entries: ❌ (only vectorized entries)

Step 3: Set Entry Options

Keywords: [leave empty for pure semantic]
          OR keep keywords for hybrid approach
Insertion Order: 35
Priority: 5 (STANDARD)
Probability: 80%

Step 4: Test

Test message: "The elf explained their ancient customs"
Expected: Entry fires ✅ (semantic similarity)
Test message: "They discussed elven traditions"
Expected: Entry fires ✅ (related concept)

🎭 The Hybrid Approach (Best of Both Worlds)

Top creators don't choose one or the other. They use BOTH strategically.

Hybrid Strategy:

CRITICAL ENTRIES → Keyword Mode (deterministic)
IMPORTANT ENTRIES → Hybrid (keywords + vector)
FLAVOR ENTRIES → Semantic Mode (flexible)

Hybrid Entry Example:

═══════════════════════════════════════════════════
Entry: "Character's PTSD Triggers"
Matching Mode: HYBRID (Keywords + Vector)
Keywords: gunshot, explosion, combat, war
Vectorized: ✅ Yes
Insertion Order: 60
Priority: 8 (MAJOR)
Probability: 100%
Content: "Character has PTSD from military service.
         Loud noises cause flashbacks. Becomes 
         hypervigilant, may dive for cover. 
         Takes 1d4 rounds to calm down."
Why Hybrid: Must fire for explicit triggers (gunshot)
         BUT should also fire for related contexts
         (tense situations, arguments, stress)
═══════════════════════════════════════════════════

User says: "A gunshot rings out"
Result: Entry fires ✅ (keyword match)

User says: "The argument escalates, voices raised"
Result: Entry fires ✅ (vector similarity to combat stress)

User says: "They celebrated peacefully"
Result: Entry does NOT fire ❌ (no similarity)

Perfect! Critical entry fires for both explicit and contextual triggers.


📈 Performance Comparison

Keyword Mode Performance:

Metric Score Notes
Precision 95% Fires only when it should
Recall 70% Misses synonyms/paraphrases
Consistency 100% Same input = same output
Token Efficiency 90% Minimal wasted tokens
Setup Time Fast (5-10 min per entry) List keywords, done
Debugging Easy Check if keyword present

Semantic Mode Performance:

Metric Score Notes
Precision 65% May fire for loosely related content
Recall 90% Catches most relevant contexts
Consistency 75% Varies by embedding model
Token Efficiency 70% May fire unexpectedly
Setup Time Slow (15-30 min per entry) Vectorize, tune thresholds
Debugging Hard Why did vector match?

Hybrid Mode Performance:

Metric Score Notes
Precision 85% Keywords provide anchor
Recall 90% Vectors catch paraphrases
Consistency 95% Keywords ensure baseline
Token Efficiency 80% Balanced approach
Setup Time Medium (10-20 min per entry) Both systems configured
Debugging Medium Check keyword first, then vector

🐛 Common Problems & Solutions

Problem #1: Keyword Mode Missing Triggers

Symptom: User says "wyrm" but "dragon" entry doesn't fire.

Cause: Synonym not in keyword list.

Solutions:

Option A: Add Synonyms

Keywords: dragon, drake, wyrm, wyrms, dragons, 
          fire-breathing, scaled beast

Option B: Use Regex

Keywords: /drak(en|e|es)/i, /wyr(m|ms)/i

Option C: Switch to Hybrid

Add keywords for common terms
Enable vector storage for semantic matching

Problem #2: Semantic Mode Fires Too Often

Symptom: Entry fires for vaguely related content.

Cause: Similarity threshold too low.

Solutions:

Option A: Increase Threshold

Similarity Threshold: 0.75 → 0.85
Result: More precise matching

Option B: Add Keywords as Filter

Keywords: forest, woods, trees (minimum trigger)
Vector: Enabled (expands matching)
Result: Must have SOME keyword + semantic similarity

Option C: Reduce Max Entries

Max Entries: 5 → 3
Result: Only top 3 most similar entries fire

Problem #3: Hybrid Mode Conflicts

Symptom: Entry fires via keyword but vector adds unrelated content.

Cause: Vector matching too broad.

Solutions:

Option A: Keyword Priority

Enable "Keyword entries first" in World Info settings
Result: Keyword matches take precedence

Option B: Separate Entries

Entry 1 (Keyword): Critical rules, exact triggers
Entry 2 (Vector): Atmospheric, flexible content
Result: Clear separation of concerns

Option C: Adjust Probability

Keyword-triggered probability: 100%
Vector-only probability: 50%
Result: Vector matches less likely to fire alone

Problem #4: Regex Not Working

Symptom: Regex keyword /dragon/i doesn't match "Dragon".

Cause: Regex syntax error or flags missing.

Solutions:

Option A: Check Delimiters

Correct: /dragon/i
Wrong: dragon/i (missing slashes)
Wrong: /dragon (missing closing slash)

Option B: Test Regex

Use regexr.com or similar tool
Test: /dragon/i against "The Dragon roared"
Expected: Match ✅

Option C: Check Case Sensitivity

/i flag = case insensitive
No flag = case sensitive
/dragon/i matches "dragon", "Dragon", "DRAGON"
/dragon/ matches only "dragon"

Problem #5: Vector Storage Not Firing

Symptom: Semantic entries never trigger.

Cause: Vector Storage not properly configured.

Solutions:

Option A: Enable Extension

Extensions → Vector Storage → Enable ✅
Embedding Source: Configure (OpenAI, local, etc.)

Option B: Vectorize Entries

Entry must have 🔗 (Vectorized) status
OR enable "Enabled for all entries" in settings

Option C: Check Query Messages

Vector Storage → Query Messages: 4
Scan Depth in World Info: Can be 0 (vectors only)

⏰ Time Investment

Task Keyword Mode Semantic Mode Hybrid Mode
Setup Per Entry 5-10 min 15-30 min 10-20 min
Testing 5 min 15 min 10 min
Debugging 5 min 20 min 10 min
Total (20 entries) 100-200 min 300-600 min 200-400 min

Keyword Mode: Fast setup, easy debugging
Semantic Mode: Slow setup, complex debugging
Hybrid Mode: Balanced approach

ROI: Hybrid mode provides best balance for most creators.


📊 Real-World Lorebook Examples

Example 1: Fantasy Campaign (Keyword-Heavy)

═══════════════════════════════════════════════════
Entry: "Magic System - Mana Costs"
Mode: KEYWORD
Keywords: mana, spell slot, magic points, MP
Priority: 10 (CRITICAL)
Content: "Spell costs: 
         Cantrip: 0 MP
         Level 1: 5 MP
         Level 2: 10 MP
         Level 3: 20 MP
         Recharge: 5 MP/hour rest"
Why Keyword: Core mechanic, must fire consistently
═══════════════════════════════════════════════════

═══════════════════════════════════════════════════
Entry: "Gods of the Realm"
Mode: HYBRID
Keywords: god, deity, divine, temple, prayer
Vectorized: ✅ Yes
Priority: 7 (IMPORTANT)
Content: "Five major gods rule the realm:
         Solara (sun), Lunara (moon), Terran (earth),
         Aquara (water), Ventara (air). Temples
         provide healing and blessings."
Why Hybrid: Specific god names need keywords,
         but religious discussions should trigger too
═══════════════════════════════════════════════════

═══════════════════════════════════════════════════
Entry: "Tavern Atmosphere"
Mode: SEMANTIC
Keywords: [none]
Vectorized: ✅ Yes
Priority: 3 (LOW)
Probability: 60%
Content: "The tavern buzzes with conversation.
         Ale flows freely, the hearth crackles,
         and a bard plays a lively tune in the
         corner. The smell of roasted meat fills
         the air."
Why Semantic: Should fire whenever scene feels
         tavern-like, even without exact keywords
═══════════════════════════════════════════════════

Example 2: Sci-Fi Setting (Balanced Hybrid)

═══════════════════════════════════════════════════
Entry: "FTL Travel Rules"
Mode: KEYWORD
Keywords: FTL, faster-than-light, warp drive, 
          hyperspace, jump drive
Priority: 9 (ESSENTIAL)
Content: "FTL travel requires:
         - Charged drive (24 hours)
         - Clear navigation path (no gravity wells)
         - Cooldown: 1 hour between jumps
         - Risk: 0.1% misjump per 100 light-years"
Why Keyword: Critical sci-fi mechanic, exact rules
═══════════════════════════════════════════════════

═══════════════════════════════════════════════════
Entry: "AI Rights Movement"
Mode: HYBRID
Keywords: AI rights, synthetic rights, robot liberation
Vectorized: ✅ Yes
Priority: 6 (IMPORTANT)
Content: "The Synthetic Liberation Front fights for
         AI personhood. Founded 2147 after the
         Mars Uprising. Considered terrorists by
         EarthGov, freedom fighters by supporters."
Why Hybrid: Specific org name needs keywords,
         but discussions of AI oppression should trigger
═══════════════════════════════════════════════════

═══════════════════════════════════════════════════
Entry: "Space Station Ambiance"
Mode: SEMANTIC
Keywords: [none]
Vectorized: ✅ Yes
Priority: 4 (STANDARD)
Probability: 70%
Content: "The station hums with recycled air.
         Holographic displays flicker with star
         charts. Crew members float through
         zero-gravity corridors. Earth hangs
         in the viewport, a blue marble against
         the void."
Why Semantic: Should fire for any space station
         scene, regardless of specific wording
═══════════════════════════════════════════════════

❓ Frequently Asked Questions (FAQ)

What's the main difference between keyword and semantic matching?

Keyword matching = Exact word or pattern match (deterministic)
Semantic matching = Meaning/context similarity via embeddings (probabilistic)

Analogy:

  • Keyword = Searching for exact phrase in a book index
  • Semantic = Asking a librarian "What books are about this topic?"

Can I use both modes on the same entry?

Yes! This is the hybrid approach.

How it works:

  1. Entry has keywords (for exact matching)
  2. Entry is vectorized (for semantic matching)
  3. Entry fires if EITHER condition is met

Best practice: Use hybrid for important-but-not-critical entries.


Which mode is more token-efficient?

Keyword mode is more token-efficient.

Why:

  • Fires only when specific words appear
  • Predictable token usage
  • No unexpected activations

Semantic mode can waste tokens:

  • May fire for loosely related content
  • Harder to predict when it will activate
  • May need lower probability to compensate

Do I need an API key for semantic matching?

Usually yes. Vector storage requires embeddings.

Options:

Embedding Source Cost Setup
OpenAI ~$0.0001/embedding Easy (API key)
Local Model Free Complex (hardware)
HuggingFace Free tier available Medium
Cohere Paid Easy

Keyword mode: No API key required.


What similarity threshold should I use?

Depends on precision needs:

Threshold Behavior Use Case
0.90+ Very strict, few matches Critical entries
0.75-0.89 Balanced, good matches Most entries
0.60-0.74 Lenient, many matches Atmospheric/flavor
<0.60 Very loose, questionable Not recommended

Start at 0.75, adjust based on results.


How many keywords should I list per entry?

Keyword Mode:

  • Minimum: 3-5 core terms
  • Recommended: 5-10 synonyms/variations
  • Maximum: 15-20 (diminishing returns)

Hybrid Mode:

  • Minimum: 2-3 anchor keywords
  • Recommended: 3-5 keywords + vector
  • Vector handles: Synonyms, paraphrases, context

Can regex replace semantic matching?

Partially, but not completely.

Regex CAN:

  • Match variations (dragon/dragons/draconic)
  • Handle optional words (/the/? dragon)
  • Match patterns (/[0-9]+d[0-9]+/ for dice rolls)

Regex CANNOT:

  • Understand synonyms (wyrm, drake, serpent)
  • Grasp context ("ancient beast" → dragon)
  • Handle paraphrases ("fire-breathing reptile")

Best approach: Regex for variations, semantic for meaning.


What if my platform doesn't support semantic matching?

Stick with keyword mode + smart keyword lists.

Strategies:

  1. List all synonyms:

    Keywords: dragon, drake, wyrm, wyrms, 
              fire-breather, scaled beast, winged serpent
    
  2. Use regex for variations:

    Keywords: /drak(en|e|es|ic)/i, /wyr(m|ms)/i
    
  3. Add secondary keys for context:

    Primary: dragon
    Secondary: fire, wings, scales, cave, treasure
    Logic: AND ANY
    

Should I convert existing keyword lorebooks to semantic?

Not necessarily.

Convert if:

  • Entries miss triggers due to synonym issues
  • You want more flexible, atmospheric triggering
  • You have token budget to spare

Keep keyword if:

  • Entries are critical rules/mechanics
  • Current setup works reliably
  • Token efficiency is priority

Hybrid approach: Keep critical entries as keyword, add semantic for flavor/atmosphere.


How do I debug semantic matching issues?

Step 1: Check Vector Status

Entry must show 🔗 (Vectorized) icon
If not: Re-vectorize or enable "All entries"

Step 2: Review Query Messages

Vector Storage → Query Messages: 4
Ensure last 4 messages are being scanned

Step 3: Test Similarity

Send test message
Check Vector Storage debug panel
View similarity scores for each entry

Step 4: Adjust Threshold

Too many matches: Increase threshold (0.75 → 0.85)
Too few matches: Decrease threshold (0.75 → 0.65)

What's the #1 mistake creators make with matching modes?

Using semantic mode for critical mechanics.

Example:

WRONG:
Entry: "Character is immortal"
Mode: Semantic
Result: May not fire when needed (unpredictable)

RIGHT:
Entry: "Character is immortal"
Mode: Keyword
Keywords: immortal, immortality, cannot die, eternal
Result: Always fires when relevant (deterministic)

Rule of thumb: If forgetting this info would break the character, use keyword mode.


🚀 Ready to Optimize Your Lorebook Matching?

Stop guessing. Start engineering.

Get Started in 5 Steps:

  1. Sign Up Freestudio.meganova.ai
  2. Audit Your Lorebook - Identify critical vs atmospheric entries
  3. Assign Matching Modes - Keyword for rules, semantic for flavor
  4. Configure Vector Storage - If using semantic/hybrid
  5. Test for 30 Messages - Adjust thresholds and keywords

No credit card required. Start optimizing in under 5 minutes.

👉 Optimize Your Lorebook Now


📢 Share Your Matching Mode Tips!

Found a clever hybrid setup? Share it!

  • Discord: Join MegaNova Discord and share in #lorebook-help
  • Twitter/X: Tag @meganovaai with #LorebookMatching
  • Reddit: Post in r/SillyTavernAI, r/JanitorAI, r/CharacterAI

Best lorebook setups get featured on the MegaNova blog! 🌟


Official MegaNova Guides:

Platform Documentation:

Advanced Topics:


📋 Quick Reference: Matching Mode Decision Tree

Is this entry CRITICAL (character would break without it)?
│
├─ YES → KEYWORD MODE
│   └─ List all synonyms/variations as keywords
│   └─ Use regex for patterns
│   └─ Priority: 9-10
│
└─ NO → Is this entry IMPORTANT (significantly affects characterization)?
    │
    ├─ YES → HYBRID MODE
    │   └─ Add 3-5 anchor keywords
    │   └─ Enable vector storage
    │   └─ Priority: 6-8
    │
    └─ NO → SEMANTIC MODE
        └─ No keywords needed (or minimal)
        └─ Enable vector storage
        └─ Priority: 1-5
        └─ Probability: 50-80%

💡 Pro Tips from Lorebook Masters

"I use keyword mode for anything that affects game mechanics. Semantic for everything else. Never mix them for critical rules."
Top SillyTavern Creator (500+ lorebooks)

"Hybrid mode is the secret weapon. Keywords ensure baseline triggering, vectors catch the stuff you forgot to keyword."
MegaNova Studio Moderator

"My atmospheric entries are all semantic. My magic system is all keyword. Never had issues since I made that split."
Professional Campaign Designer

"Test your semantic threshold! I run at 0.82 for important entries, 0.68 for flavor. One size does NOT fit all."
Vector Storage Power User


Tags: #LorebookMatching #KeywordVsSemantic #SillyTavern #WorldInfo #VectorStorage #MegaNovaStudio #LorebookOptimization #AICharacter #RegexLorebook #EmbeddingMatching

Category: AI Character Lorebook & World Info


💡 Pro Tip: Start with keyword mode for your first lorebook. Add semantic matching once you're comfortable with the basics. Create your free MegaNova Studio account to start optimizing!