Architecture & Design Guide¶
This guide describes the internals of src/execsql/ for contributors. The codebase was refactored from a 16,600-line monolith into a modular package; the importable module is execsql, the PyPI distribution is execsql2.
Execution Flow¶
When a user runs execsql2 script.sql mydb.sqlite -t l, the following sequence occurs:
flowchart TD
CLI["CLI entry point<br/><code>cli/__init__.py</code><br/>Typer parses args"]
RUN["<code>_run()</code><br/><code>cli/run.py</code><br/>Initialize state, config, subvars"]
CONF["Load configuration<br/><code>ConfigData</code><br/>Merge execsql.conf files"]
INIT["Initialize state<br/><code>state.initialize()</code><br/>Create singletons"]
PARSE["Parse script<br/><code>parse_script()</code> / <code>parse_string()</code><br/>Build AST tree"]
CONNECT["Connect to database<br/><code>_connect_initial_db()</code><br/>Via db/factory.py"]
EXEC["<code>execute(tree, ctx=)</code><br/><code>script/executor.py</code><br/>Walk AST nodes"]
SQL["SQL node<br/>Execute via current Database"]
META["Metacommand node<br/>Dispatch via <code>_state.metacommandlist</code>"]
DONE["Tree exhausted<br/>Close connections, exit"]
CLI --> RUN
RUN --> CONF
CONF --> INIT
INIT --> PARSE
PARSE --> CONNECT
CONNECT --> EXEC
EXEC --> SQL
EXEC --> META
EXEC -->|"recurse into IfBlock,<br/>LoopBlock, ScriptBlock,<br/>IncludeDirective"| EXEC
SQL --> EXEC
META --> EXEC
EXEC --> DONE
execute() recursively walks the AST via _execute_nodes() / _execute_node(). Each node type has dedicated handling: SQL statements run on the current Database; metacommands dispatch through _state.metacommandlist; IfBlock / LoopBlock drive their bodies based on tree structure; IncludeDirective parses and recurses into the included file; ScriptBlock registers a named script on ctx.ast_scripts for later EXECUTE SCRIPT lookup.
Module Map¶
flowchart LR
CLI["cli/<br/>Entry point,<br/>arg parsing,<br/>DSN parsing"]
API["api.py<br/>Public Python<br/>entry point"]
CONFIG["config.py<br/>ConfigData,<br/>StatObj,<br/>WriteHooks"]
STATE["state.py<br/>Global runtime<br/>singletons"]
SCRIPT["script/<br/>engine, control,<br/>variables, AST"]
META["metacommands/<br/>Dispatch table,<br/>~225 handlers"]
DB["db/<br/>Database ABC,<br/>9 adapters,<br/>DatabasePool"]
EXPORT["exporters/<br/>20+ output<br/>formats"]
IMPORT["importers/<br/>CSV, ODS,<br/>XLS, Feather"]
GUI["gui/<br/>Tkinter, Textual,<br/>Console backends"]
UTILS["utils/<br/>auth, crypto,<br/>fileio, mail,<br/>regex, strings"]
PARSER["parser.py<br/>CondParser,<br/>NumericParser"]
DEBUG["debug/<br/>REPL debugger"]
CLI --> CONFIG
CLI --> STATE
CLI --> SCRIPT
CLI --> DEBUG
API --> STATE
API --> SCRIPT
SCRIPT --> STATE
SCRIPT --> META
META --> STATE
META --> DB
META --> EXPORT
META --> IMPORT
META --> GUI
META --> UTILS
DB --> STATE
EXPORT --> STATE
GUI --> STATE
SCRIPT --> PARSER
Package summary¶
| Package | Purpose |
|---|---|
cli/ |
Typer app, _run() orchestration, DSN URL parsing, Rich help output, --lint entry points |
api.py |
Public execsql.run() Python entry point for notebooks, pipelines, and library use |
config.py |
ConfigData (INI merging), StatObj (runtime flags), WriteHooks (stdout/stderr redirection) |
state.py |
Thread-local runtime store — all shared mutable state lives here, isolated per-thread |
script/ |
AST node types, parser, MetaCommandList, SubVarSet, BatchLevels, ScriptExecSpec, set_system_vars() |
metacommands/ |
build_dispatch_table(), all x_* handlers, build_conditional_table(), all xf_* predicates |
db/ |
Database ABC, DatabasePool, 9 adapter modules (postgres, sqlite, duckdb, mysql, sqlserver, oracle, firebird, access, dsn) |
exporters/ |
ExportRecord, ExportMetadata, WriteSpec, 20+ format writers (CSV, JSON, XML, HTML, etc.) |
importers/ |
CsvFile, OdsFile, XlsFile, FeatherFile — data import backends |
gui/ |
GuiBackend ABC, TkinterBackend, TextualBackend, ConsoleBackend |
utils/ |
Shared utilities: file I/O, encryption, mail, regex helpers, string manipulation, timers |
parser.py |
Recursive-descent parsers for conditional (IF) and arithmetic (SET) expressions |
types.py |
DataType subclasses and DbType per-DBMS type dialect mappings |
models.py |
Column, DataTable, JsonDatatype |
format.py |
execsql-format CLI — opinionated formatter for execsql scripts |
exceptions.py |
ExecSqlError base, ErrInfo, ConfigError, DataTypeError, DbTypeError, etc. |
plugins.py |
Entry-point plugin discovery for metacommands, exporters, and importers |
debug/ |
Interactive REPL debugger for stepping through script execution |
data/ |
Bundled package data (execsql.conf.template powers --init-config) |
AST Parser and Executor¶
execsql2 parses scripts into an AST and walks the tree to execute. There is no separate flat-command-list engine; the AST executor is the only path.
script/ast.py— defines 9Nodesubclasses (SqlStatement,MetaCommandStatement,Comment,IfBlock,LoopBlock,BatchBlock,ScriptBlock,SqlBlock,IncludeDirective) plus theScriptroot and supporting types (SourceSpanfor source locations,ConditionModifierfor ANDIF/ORIF,ElseIfClause,ParamDef).script/parser.py—parse_script(path)/parse_string(text)produce aScripttree. All block structures (IF/LOOP/BATCH/SCRIPT/SQL) are resolved at parse time into nested nodes; the parser also reports structural errors (unmatched blocks).script/executor.py—execute(tree, ctx=)walks the tree via_execute_nodes()/_execute_node(). SQL and metacommands delegate to the existing dispatch tables. INCLUDE'd files are parsed and recursed into natively; circular INCLUDE references are detected viactx.include_chainand reported. Named SCRIPT blocks register inctx.ast_scripts(instance-scoped) for laterEXECUTE SCRIPTlookup.ON ERROR_HALT/ON CANCEL_HALT EXECUTE SCRIPTdeferred scripts also run through the AST executor.cli/lint.py— AST-based linter for variable and INCLUDE checks plus the Rich result printer; structural validation lives in the parser.
Use --parse-tree to print the AST without executing.
RuntimeContext¶
RuntimeContext (in state.py) holds the per-run mutable state. execute() accepts an explicit ctx; the active_context() context manager installs one as the active thread-local so metacommand handlers and database adapters resolve against it automatically. Each thread gets its own context via threading.local(), enabling concurrent execsql.run() calls. The context carries the AST script registry (ast_scripts), the include cycle detector (include_chain), and the unified execution stack (ast_exec_stack) — a list of ExecFrame records describing every active scope, IF/LOOP/BATCH block, and INCLUDE'd file. Scope frames (kind="main" / kind="script") hold the active localvars and paramvals; block frames cache a scope_ref to the enclosing scope for O(1) variable lookup. current_script_line() reads ctx.last_command, which the executor updates per statement.
Plugin System¶
execsql supports plugins via Python entry points. Plugins can register custom metacommands, export formats, and import formats.
- Entry point groups:
execsql.metacommands,execsql.exporters,execsql.importers - Discovery:
plugins.discover_metacommand_plugins()is called duringstate.initialize(). Exporter/importer plugins are discovered viadiscover_exporter_plugins()/discover_importer_plugins(). - Error handling: Broken plugins are logged and skipped -- they cannot prevent execsql from starting.
- Template:
extras/plugin-template/provides a starting point for creating plugins.
See --list-plugins to view discovered plugins.
Metacommand Dispatch¶
Metacommands are lines in SQL scripts prefixed with -- !x!. At import time, metacommands/__init__.py calls build_dispatch_table(), which populates a MetaCommandList with all mcl.add() registrations (~225 regex patterns). This singleton is stored as _state.metacommandlist.
How dispatch works¶
For each MetaCommandStatement AST node, the executor calls _exec_metacommand() in script/executor.py, which delegates to _state.metacommandlist.eval(cmd_str). The dispatcher extracts the leading keyword, narrows ~225 entries to a small candidate set via a keyword index, then tests each candidate's compiled regex against the full command. The matched handler is called with the regex's named groups as keyword arguments plus metacommandline (the original unmodified line), and its hit counter is incremented.
Handler conventions¶
x_*functions are metacommand handlers (e.g.,x_export,x_connect_pg).xf_*functions are conditional test predicates (e.g.,xf_tableexists,xf_contains).- All handlers accept
**kwargs-- keys come from named regex groups. - Handlers raise
ErrInfofor expected failures; the dispatch layer handles error propagation based onhalt_on_metacommand_err.
For a step-by-step guide to adding a new metacommand, see Adding Metacommands.
Conditional Expressions¶
The IF/ELSEIF/ELSE/ENDIF metacommands control conditional execution structurally — they are parsed into IfBlock AST nodes by script/parser.py and dispatched by _execute_if() in script/executor.py.
How IF blocks execute¶
_execute_if() evaluates the IF condition (and any ANDIF/ORIF modifiers) via CondParser, then walks the chosen branch (the IF body, one of the ELSEIF clauses, or the ELSE body). For tracking and REPL introspection it pushes a non-scope ExecFrame (kind="if"/"elseif"/"else") onto ctx.ast_exec_stack whose scope_ref points at the enclosing SCRIPT/main scope; the frame is popped when the branch finishes.
Because branching is structural, the executor never consults dispatch-time flags to decide whether an IF/ELSE/ELSEIF/ENDIF/ANDIF/ORIF handler should fire — the parser has already chosen the live branch. The corresponding dispatch entries (and the BREAK / BEGIN BATCH / END BATCH entries) remain registered as ErrInfo stubs so that a parser regression which let one of those metacommands fall through to the dispatch table would fail loudly rather than silently. (The pre-AST flat-command-list engine relied on a _state.if_stack + IfLevels state machine and matching run_when_false / run_in_batch flags on MetaCommand; both were removed once the AST became the sole execution engine.)
CondParser¶
The CondParser class in parser.py is a recursive-descent parser that evaluates boolean expressions in IF and ELSEIF metacommands. It builds an AST of CondAstNode objects supporting AND, OR, NOT, and conditional-test leaves.
Each conditional leaf is matched against the conditional dispatch table (_state.conditionallist), which works identically to the metacommand dispatch table but contains xf_* predicates that return bool.
xf_* predicates¶
Conditional test functions live in metacommands/conditions.py. Examples:
xf_tableexists-- checks if a table exists in the databasexf_fileexists-- checks if a file exists on diskxf_contains-- string containment testxf_equals,xf_isgt,xf_isgte— comparison testsxf_startswith-- string prefix test
These are registered in build_conditional_table() with category="condition" for keyword introspection.
Substitution Variables¶
SubVarSet (in script/variables.py) holds substitution variables in _state.subvars. Before each command runs, substitute_vars() in engine.py replaces all !!var!! patterns with their current values. The deferred form !{var}! is expanded at the point of use rather than at parse time, which is essential inside loops where the value changes on each iteration. See Substitution Variables for user-facing syntax and types.
Scoping classes¶
SubVarSet— global scope, shared across all scripts.LocalSubVarSet— per-ExecFrameoverlay for~-prefixed variables; lives on the active SCRIPT/main scope frame and is freed when the frame is popped. Retrieved via_state.current_localvars().ScriptArgSubVarSet— per-script#-prefixed arguments set byEXECUTE SCRIPT; lives on the SCRIPT-kindExecFrame. Retrieved via_state.current_paramvals().CounterVars— auto-incrementing$COUNTER_Nvariables on_state.counters.
Database Abstraction¶
Database ABC¶
db/base.py defines the Database abstract base class. Every DBMS adapter subclasses it and implements:
open_db()-- Establish the connection using the appropriate driver.exec_cmd()-- Execute a stored procedure or function.- Driver-specific overrides for
table_exists(),get_table_list(),role_exists(), etc.
The base class provides shared implementations for execute(), select_data(), select_rowsource(), select_rowdict(), commit(), rollback(), and populate_table().
DatabasePool¶
DatabasePool is a dict-like container mapping string aliases to open Database instances. It is the canonical _state.dbs object. Key operations:
add(alias, db)-- Register a connection (the first one becomesinitialandcurrent).current()-- Return the active database.make_current(alias)-- Switch the active database (viaUSEmetacommand).disconnect(alias)-- Close and remove a connection.closeall()-- Roll back and close everything at exit.
Adapter loading¶
db/factory.py provides convenience constructors (db_Postgres, db_SQLite, db_DuckDB, etc.) that are called from _connect_initial_db() in cli/run.py based on the db_type flag. Additional connections are created by CONNECT metacommands at runtime.
For a step-by-step guide to adding a new database adapter, see Adding Database Adapters.
Export/Import Pipeline¶
Export flow¶
- An
EXPORTmetacommand is parsed and dispatched tox_export(inmetacommands/data.py). - The handler determines the output format from the metacommand arguments.
- It calls
select_data()orselect_rowsource()on the current database to fetch results. - The appropriate exporter module (e.g.,
exporters/delimited.py,exporters/json.py) formats and writes the output. - File writing is asynchronous via
FileWriter, a background process that serializes writes through a shared queue.
Import flow¶
- An
IMPORTmetacommand is dispatched to the appropriate handler. - The handler instantiates the correct importer (e.g.,
importers/csv.py,importers/ods.py) based on the file type. - The importer reads the file and yields rows.
Database.populate_table()bulk-inserts the rows usingexecutemany(), with optional progress bars.
Export metadata¶
ExportMetadata (in exporters/base.py) tracks per-export metadata — file names, row counts, timestamps. The $SHEETS_* system variables surface multi-sheet IMPORT results; see Substitution Variables for the full list.
For guides on extending these pipelines, see Adding Exporters and Adding Importers.
GUI Subsystem¶
execsql supports three GUI backends for interactive prompts (password dialogs, pause screens, data displays, console windows):
| Backend | Module | When used |
|---|---|---|
TkinterBackend |
gui/desktop.py |
Default when --gui-framework tkinter or Tkinter is available |
TextualBackend |
gui/tui.py |
When --gui-framework textual is specified |
ConsoleBackend |
gui/console.py |
Fallback when no GUI framework is available |
All backends implement the GuiBackend ABC (gui/base.py), which defines a dispatch() method that routes GuiSpec requests to type-specific handlers (show_msg, show_pause, show_credentials, show_entry_form, etc.).
GUI manager thread¶
When --visible-prompts is set to level 1 or higher, a GUI manager thread (_state.gui_manager_thread) runs alongside the script execution thread. GUI requests are sent via _state.gui_manager_queue and responses are returned synchronously. The utils/gui.py module provides the public API (gui_connect, gui_console_on, gui_console_off, etc.) that metacommand handlers call.
Global State¶
state.py is the thread-local mutable state store for the runtime. All other modules access it via:
This import pattern (always as _state, always accessed inside function/method bodies) avoids circular imports at load time. The module stores all mutable state in a threading.local() instance, so each thread gets its own isolated RuntimeContext. The module provides two management functions:
initialize(config, dispatch_table, conditional_table)-- Called once from_run()to create runtime singletons (DatabasePool,CounterVars,BatchLevels, etc.).reset()-- Tears down all state for test isolation.
Key state variables¶
| Variable | Type | Purpose |
|---|---|---|
conf |
ConfigData |
Merged configuration |
subvars |
SubVarSet |
Global substitution variables |
ast_exec_stack |
list[ExecFrame] |
Unified execution stack (scopes + IF/LOOP/BATCH/INCLUDE) |
ast_scripts |
dict[str, ScriptBlock] |
Named scripts from BEGIN/END SCRIPT (AST registry) |
last_command |
ScriptCmd \| None |
Most-recently-executed statement; powers current_script_line() |
dbs |
DatabasePool |
Open database connections |
metacommandlist |
MetaCommandList |
Metacommand dispatch table |
conditionallist |
MetaCommandList |
Conditional predicate dispatch table |
counters |
CounterVars |
Auto-incrementing counters |
filewriter |
FileWriter |
Background file-writing process |
exec_log |
Logger |
Execution log |
Configuration¶
ConfigData¶
ConfigData (in config.py) reads INI-format execsql.conf files from four locations (in order, later values override earlier):
- System-wide (platform-dependent)
- User home directory
- Script file directory
- Current working directory
It exposes all recognized options as attributes (db_type, server, db, db_file, username, script_encoding, output_encoding, etc.).
CLI flag precedence¶
CLI flags (-t, -u, -p, --dsn, etc.) override config-file values. The --dsn connection string is parsed first and provides defaults that individual flags can further override. This precedence chain is implemented in _run():
execsql.conf (system) < execsql.conf (user) < execsql.conf (script dir)
< execsql.conf (cwd) < --dsn < individual CLI flags
CONFIG metacommands¶
At runtime, scripts can modify configuration via CONFIG metacommands (e.g., CONFIG TRIM_STRINGS ON, CONFIG SCAN_LINES 200). These modify _state.conf attributes directly and take effect for all subsequent commands.
Further Reading¶
- Adding Metacommands -- Step-by-step guide for new metacommands
- Adding Exporters -- How to add a new export format
- Adding Database Adapters -- How to support a new DBMS
- Adding Importers -- How to add a new import format