How It Works¶
A Sublime Text 4 plugin that automatically detects and sets the correct syntax (language) for views using a cascading strategy pipeline — from user-defined rules through deep learning.
Architecture Overview¶
flowchart TD
ST["Sublime Text Events"] --> Boot["boot.py<br/>module cleanup"]
Boot --> Init["plugin/__init__.py<br/>plugin_loaded()"]
Init --> Custom["Load custom<br/>matches & constraints"]
Init --> Settings["AioSettings<br/>async setting watcher"]
Init --> Compile["compile_rules()<br/>per window"]
Compile --> Collection["SyntaxRuleCollection"]
Settings -.->|on change| Compile
ST -->|view events| Listener["EventListener<br/>+ TextChangeListener"]
Listener --> Pipeline["run_auto_set_syntax_on_view()"]
Pipeline --> Cascade["Strategy Cascade<br/>(9 steps)"]
Cascade --> Assign["assign_syntax_to_view()"]
Collection -->|test rules| Pipeline Entry Point & Lifecycle¶
boot.py¶
Clears previously loaded plugin modules on reload, then imports plugin/__init__.py. This ensures a clean state during development.
plugin/__init__.py¶
On plugin_loaded() (ST lifecycle hook):
- Adds vendored Python lib path to
sys.path - Loads custom implementations from
AutoSetSyntax-Custom/matches/andAutoSetSyntax-Custom/constraints/ - Sets up
AioSettings— watches settings files asynchronously; recompiles rules on change - Calls
set_up_window()for each open window — compiles rules, logs version info, checks Magika availability - Optionally runs syntax detection on startup views
On plugin_unloaded():
- Tears down settings watchers and per-window state (log panels, rule collections)
Trigger Events¶
flowchart LR
subgraph Events["ST Event Listeners"]
Load["on_load"] -->|"ListenerEvent.LOAD"| Pipeline
New["on_new"] -->|"ListenerEvent.NEW"| Pipeline
Save["on_post_save"] -->|"ListenerEvent.SAVE"| Pipeline
Reload["on_reload"] -->|"ListenerEvent.RELOAD"| Pipeline
Activate["on_activated"] -->|"ListenerEvent.UNTRANSIENTIZE"| Pipeline
Change["on_text_changed_async"] -->|"ListenerEvent.MODIFY/PASTE"| Pipeline
Exec["on_post_window_command"] -->|"ListenerEvent.EXEC"| Pipeline
Revert["on_revert"] -->|"ListenerEvent.REVERT"| Pipeline
end
Pipeline["run_auto_set_syntax_on_view()"]
Pipeline --> PreChecks All events converge on run_auto_set_syntax_on_view() in plugin/commands/auto_set_syntax.py.
Per-Event Behavior Detail¶
| Event | Listener | Trigger | Notes |
|---|---|---|---|
LOAD | on_load | File opened | Marks transient state |
NEW | on_new | New untitled file | |
SAVE | on_post_save | After save | |
RELOAD | on_reload | File reloaded | |
UNTRANSIENTIZE | on_activated | Preview → permanent | Only fires once per view |
MODIFY | on_text_changed_async | Typing in first/last lines | Debounced, plaintext only |
PASTE | on_text_changed_async | Large text insertion | Plaintext only |
EXEC | on_post_window_command | Build output panel | |
REVERT | on_revert | File reverted | |
COMMAND | auto_set_syntax command | Manual trigger | |
INIT | Startup views | Plugin load |
Prerequisites (pre-flight checks)¶
Before any strategy runs, the pipeline verifies:
- View has a window and is valid
- View is "syntaxable" — not a widget/panel, not transient, within size limit
- View is plaintext (if
must_plaintext=True) SyntaxRuleCollectionis compiled for the window- Plugin is ready (
G.is_plugin_ready())
The Strategy Pipeline (9 Steps)¶
The pipeline tries each strategy in order, stopping at the first match:
- Exec Output — assign
exec_file_syntaxfor build panels - New File — assign
new_file_syntaxfor untitled files - ST Syntax Test — skip if file is an ST syntax test
- Plugin Rules — iterate user-defined
SyntaxRulecollection - First Line — detect shebang (
#!/usr/bin/env) or modeline (-*- mode -*-) - Trimmed Filename — strip suffixes and match the base filename
- Magika (DL) — Google's deep-learning content-type detection
- Heuristics — content-based guess (currently JSON detection)
- Give Up — leave as plain text
Rules System¶
The rules system is a tree of SyntaxRule objects, each containing a nested match/constraint tree.
flowchart TD
SCR["SyntaxRuleCollection<br/>ordered list of SyntaxRules"] -->|test each| SR1["SyntaxRule 1"]
SCR --> SR2["SyntaxRule 2"]
SCR --> SR3["... N"]
SR1 -->|properties| Selector["selector: 'text.plain'"]
SR1 -->|properties| OnEvents["on_events: [LOAD, SAVE]"]
SR1 -->|properties| Syntaxes["syntax: 'source.python'"]
SR1 --> MR["root_rule: MatchRule"]
MR --> Match["AbstractMatch<br/>(any / all / some / ratio)"]
Match --> CR1["ConstraintRule<br/>(leaf condition)"]
Match --> CR2["ConstraintRule"]
Match --> NestedMatch["MatchRule<br/>(nested)"]
NestedMatch --> CR3["ConstraintRule"]
NestedMatch --> CR4["ConstraintRule"]
CR1 --> Constraint["AbstractConstraint<br/>e.g. is_extension"] SyntaxRule¶
The top-level rule configured in settings:
{
"comment": "Python files",
"syntaxes": ["Python", "scope:source.python"],
"selector": "text.plain",
"on_events": ["LOAD", "SAVE"],
"match": {
"match": "all",
"rules": [
{ "constraint": "is_extension", "args": [".py"] }
]
}
}
Fields:
syntax/syntaxes: The target syntax to assignselector: Scope filter — only applies if current scope matches (default:text.plain)on_events: Restrict which events trigger this rule (None= all events)root_rule: AMatchRule(the match/constraint tree)comment: Human-readable label (for logging/debugging)src_setting: Reference back to the original setting object
MatchRule + AbstractMatch¶
A MatchRule combines a match strategy (AbstractMatch) with child rules:
| Match | Behavior | Droppable When |
|---|---|---|
any | At least one child passes | No child rules |
all | Every child passes | No child rules |
some(n) | At least n children pass | n > number of children |
ratio(n/d) | At least n out of d pass | Bad ratio parameters |
Children can be ConstraintRule (leaf) or nested MatchRule (sub-tree).
ConstraintRule + AbstractConstraint¶
A ConstraintRule wraps an AbstractConstraint with optional inversion:
| Constraint | What It Checks |
|---|---|
is_extension | File extension matches |
is_name | Filename matches |
contains / contains_regex | Content contains text / regex |
first_line_contains / first_line_contains_regex | First line matches |
is_syntax | Current syntax matches |
is_size | File size within range |
is_line_count | Line count within range |
is_interpreter | Shebang interpreter matches |
is_hidden_syntax | Syntax is a hidden/private syntax |
is_platform / is_platform_arch | OS / architecture matches |
is_arch | CPU architecture matches |
is_in_git_repo / is_in_hg_repo / is_in_svn_repo | VCS repository check |
is_in_python_django_project | Django project check |
is_in_ruby_on_rails_project | Rails project check |
is_magika_enabled | Magika availability check |
name_contains / name_contains_regex | Filename substring / regex |
path_contains / path_contains_regex | Full path substring / regex |
relative_exists | Relative path/file exists in project |
selector_matches | Scope selector matches |
Optimization¶
At compile time, the rule tree is optimized by sift_optimizable() — rules that are "droppable" (dead/no-op) are pruned:
SyntaxRule.is_droppable() → true if no syntax, no events (and not unrestricted), or no root_rule
MatchRule.is_droppable() → true if no children or match is ineffective (e.g., some(5) with 3 children)
ConstraintRule.is_droppable() → true if constraint is None or self-droppable
Dropped rules are logged and stored in G.dropped_rules_collection for debugging.
ViewSnapshot¶
Before any strategy runs, a ViewSnapshot is created — a frozen snapshot of the view's state at that moment:
view: Thesublime.Viewobjectcontent: Full text contentfirst_line: First line of contentchar_count,line_count: Size metricspath_obj:Path()object (orNoneif unsaved)file_extensions: List of suffixes (e.g.,.tar.gz→['.tar', '.gz'])file_name,file_name_unhidden: With/without leading dotsyntax: Current syntax objectencoding: Encoding (defaults to UTF-8 for unsaved buffers)caret_rowcol: Cursor position(row, col)for edit-aware decisionscontent_bytes,encoding_py: Lazy-computed derivatives
Final Assignment¶
assign_syntax_to_view():
flowchart TD
In["assign_syntax_to_view(view, syntax)"] --> Valid{"view.is_valid()?"}
Valid -->|no| False["return False"]
Valid -->|yes| Views["Get all views<br/>sharing the buffer"]
Views --> Loop["For each view"]
Loop --> Same{"Syntax already<br/>the same?"}
Same -->|yes| LogKeep["Log: [ALREADY]<br/>skip"]
Same -->|no| AssignST["view.assign_syntax(syntax)"]
AssignST --> Flag["Set VIEW_KEY_IS_ASSIGNED"]
Flag --> LogAssign["Log: old → new<br/>+ reason"]
LogKeep --> Next["Continue loop"]
LogAssign --> Next
Next --> Done["return True"] - Validates the view
- Gets all sibling views sharing the same buffer (via
view.buffer().views()) - For each view: skips if already has the target syntax, otherwise calls
view.assign_syntax(syntax) - Sets
VIEW_KEY_IS_ASSIGNEDon view settings - Logs the change with full context (old syntax → new syntax + reason + event)
Extensibility¶
Users can add custom AbstractMatch or AbstractConstraint implementations:
- Create a Python file in
Packages/AutoSetSyntax-Custom/matches/orPackages/AutoSetSyntax-Custom/constraints/ - Subclass
AbstractMatchorAbstractConstraint - Implement
test()(and optionallyis_droppable()) - The class name convention determines the setting name:
FooBarMatch→"foo_bar"andBazConstraint→"baz"
Auto-discovered at plugin load via _load_custom_implementations() using pkgutil.iter_modules().
Magika Integration¶
Magika (Google's deep-learning file type detector) is an optional dependency:
def get_magika_object() -> magika.Magika | None:
try:
from magika import Magika
from magika import PredictionMode
except ImportError:
return None
return Magika(prediction_mode=PredictionMode.HIGH_CONFIDENCE)
- Downloaded separately; not vendored
- Uses
HIGH_CONFIDENCEprediction mode (avoids false positives) - Only invoked for extensionless plaintext files (unless triggered via command)
- Results mapped to ST syntaxes via
magika.syntax_mapsettings
Settings¶
Uses AioSettings — an async settings watcher that:
- Merges per-project settings with plugin defaults
- Automatically recompiles rule collections when settings change
- Tracks per-window settings independently
Key settings groups:
- Strategy control:
new_file_syntax,exec_file_syntax,trim_suffixes,magika.* - Behavior:
debounce,run_on_startup_views,enable_log - Rules:
syntax_rules— the array of user-definedSyntaxRuleobjects
Shared Global State¶
@dataclass
class _GlobalState:
startup_views: set[sublime.View] # Views existing at startup
syntax_rule_collections: WindowKeyedDict # Per-window compiled rules
dropped_rules_collection: WindowKeyedDict # Per-window optimized-away rules
G = _GlobalState() — a single shared instance holding per-window compiled state.