Why Regex-Based Linters Fall Short for SystemVerilog/UVM — A Case for Parser-Based Tools
Linting SystemVerilog and UVM testbench code is crucial to maintain quality and compatibility.
Many teams start by writing quick regex or string-search scripts to catch problematic patterns—like deprecated variables, disallowed constructs, or outdated API usage.
While regex-based linting can be tempting due to its simplicity, it often leads to false failures and missed issues because SystemVerilog’s syntax and preprocessing are too complex for simple text matching.
In this series, we will explore common linting scenarios, we introduced this in an earlier post here: https://asfigo.blogspot.com/2025/02/linting-systemverilog-testbench-code.html
Below is the next one in this series with a concrete example - detecting deprecated UVM constructs like uvm_top
, and demonstrate why a proper parser-based approach using tools like Google’s Verible is superior.
A Quick, regex style lint example
uvm_top
.This variable was removed in UVM 1.2, so any appearance should be flagged.
At first glance, a simple Python script and string matching looks attractive:
for lineno, line in enumerate(f, 1):
if "uvm_top" in line:
print(f"{lineno}: uvm_top usage")
This is quick to write and run; Unfortunately, it is also fragile and in some cases wrong. The problem is that SystemVerilog is not a regular language — syntax, comments, strings, macros, and preprocessing can all fool naΓ―ve text scans.
Where Regex Fails
Below are real scenarios where regex-based matching either reports a false failure or misses an actual issue.
1. Inside Comments
// TODO: remove uvm_top usage before release
- Regex: flags — false failure
- Real Need: ignore — AST node is a comment.
2. Inside String Literals
`uvm_info("CFG", "Legacy path: uvm_top used here", UVM_LOW)
- Regex: flags — false failure
- Real Need: ignore — AST node is a string literal.
3. In Macro Definitions
`define LEGACY_TOP uvm_top
- Regex: flags — false failure
- Real Need: ignore — rule can target only symbol usage in procedural/structural code.
4. Inactive Preprocessor Branches
`ifdef UVM_11D
uvm_top.do_something();
`endif
If UVM_11D
is not defined when building:
- Regex: flags — false failure
- Real Need: code is absent from AST after preprocessing.
Introducing Verible
Verible is an open-source SystemVerilog parser and tooling framework developed and maintained by Google.
It provides a fully compliant parser that produces an Abstract Syntax Tree (AST) representing the true syntactic structure of the code after preprocessing.
This means Verible can differentiate between comments, strings, macro expansions, and active code.
Its robustness and active maintenance make it a reliable foundation for building linting tools.
Why Parsing Wins
Requirement | Regex | Verible Parser |
---|---|---|
Ignore comments/strings | ❌ false failures | ✅ correct |
Honour preprocessor defines | ❌ false failures | ✅ correct |
Match only valid identifiers | ❌ false failures | ✅ correct |
Maintainable for complex rules | ❌ brittle | ✅ extensible |
Regex linting is brittle because it works purely at the text level.
In SystemVerilog/UVM, syntactic and preprocessor context matters — without a parser, you cannot reliably distinguish a real symbol usage from a harmless mention.
Verible-Based Approach
A Verible rule can be implemented to traverse the parsed syntax tree and match only relevant identifiers in the correct context:
- Operates on the actual compiled source, respecting preprocessor defines.
- Has clear access to token type (comment, string, identifier, etc.).
- Avoids accidental matches in inactive code, macros, or comments.
- Can be extended to match specific contexts (e.g. assignments only).
Leveraging Verible with AsFigo BYOL
While Verible provides the core parsing and AST infrastructure, building and maintaining lint rules at scale requires more tooling support.
AsFigo’s Build Your Own Linter (BYOL) framework accelerates this process by offering a modular environment to develop, test, and deploy custom Verible-based lint rules tailored to your codebase.
BYOL handles AST traversal, rule management, diagnostics reporting, and CI integration out of the box — letting engineers focus on the lint logic itself rather than infrastructure.
Using BYOL with Verible means you avoid reinventing the wheel and reduce the ongoing maintenance burden of home-grown scripts, resulting in faster and more reliable lint coverage.
Recommendation
For one-off local greps, regex may be fine.
For production lint rules that run in CI and gate merges, use a proper parser such as Verible — ideally within a framework like AsFigo BYOL.
It will save you from false failures, missed detections, and the constant maintenance cost of chasing corner cases.
Comments
Post a Comment