CS292C-12: Computer-Aided Reasoning for Software

CS292C-12: Computer-Aided Reasoning for Software
Lecture 12: Conflict-Driven Program Synthesis
YF
by Yu Feng
 
Roadmap for Today's Lecture
Why "conflict-driven" synthesis?
Understanding the limitations of standard approaches and the benefits of learning from conflicts
Deep dive on CD-learning paper
Exploring architecture, key definitions, algorithm loop, examples, and evaluation from the PLDI '18 paper
Bridge to crypto kernels
Connecting synthesis techniques to cryptographic implementations with specifications, performance, and constant-time requirements
Case study on modular multiply
Examining the synthesis and optimization of cryptographic primitives
Why Existing Enumerative Synthesizers Struggle
1
No "memory" of past failures
Classic enumerative + deductive tools (e.g. Sketch-style, Morpheus, DeepCoder) discard spurious programs after a single check.
2
Can't generalize a conflict
When an SMT check shows a partial program is impossible, current systems learn nothing beyond "this instance fails."
3
Enumerative search explodes despite heuristics
Even with cost-based orderings or statistical priors, the space grows combinatorially; deduction alone can't keep pace.
4
Domain-specific pruning is brittle
Hand-crafted abstractions (table schemas, liquid types, etc.) prune some paths but don't transfer across DSLs. Overly precise specs slow SMT, while coarse specs miss pruning opportunities.
Does this ring a bell?
But those problems were solved by SAT solvers!
Modern SAT solvers use conflict-driven clause learning (CDCL) to efficiently navigate vast search spaces.
When a conflict occurs, these solvers extract a "reason" for the failure and add it to their knowledge base to avoid similar dead ends.
This ability to learn from failures and generalize conflicts revolutionized SAT solving - what if we could apply the same principles to program synthesis?
Why Conflict-Driven Synthesis Matters
Problem with Standard Approaches
Traditional enumerative and neural-guided search methods tend to repeat the same mistakes, wasting computational resources and time exploring invalid program spaces.
Learning from SAT/SMT Solvers
SAT/SMT solvers avoid repetitive mistakes through Conflict-Driven Clause Learning (CDCL), which systematically learns from failures to prune the search space.
Key Innovation
The goal is to bring this powerful "learning from conflicts" approach to program synthesis, creating more efficient search algorithms that avoid repeating mistakes.
A significant advantage of this framework is its domain-specific language (DSL) agnostic nature, making it applicable across diverse domains from string manipulation and table transformations to list operations and cryptographic implementations.
From CDCL Solver to Synthesizer
Neo's architecture directly adapts key principles from Conflict-Driven Clause Learning (CDCL) SAT solvers. It incorporates conflict analysis to derive insights from failed synthesis attempts, conflict-driven backtracking to efficiently navigate the search space, and learned clauses to prune large portions of invalid program space, all while maintaining completeness guarantees.
Feng et al. "Program Synthesis using Conflict-Driven Learning" PLDI '18
High-Level Architecture of NEO
Decide
Choose a hole and production rule consistent with knowledge base
Deduce
Apply logical consequences of current decisions
AnalyzeConflict
Identify the source of conflicts when they arise
Backtrack
Return to earlier decision point with new knowledge
This architecture mirrors the Conflict-Driven Clause Learning (CDCL) approach used in SAT solvers. The Knowledge Base (Ω) maintains learnt lemmas, similar to clauses in SAT solving. The system verifies program feasibility using SMT to check if the program constraints (Φₚ) satisfy the specification (Φ).
When conflicts are detected, NEO learns lemmas that block entire families of spurious programs, dramatically pruning the search space.
Example: Synthesis Task & Domain-Specific Language
Goal: Synthesize computeKSum(list, k) function that calculates the sum of the k largest values in a soccer league list.
Example: [49, 62, 82, 54, 76], k = 2 → 158 (82 + 76)
Mini-DSL (root symbol N) for list transformations:
sort, reverse, take, filter, head, last, sum, max, min ... with integer grammars for indices and predicates.
Each function comes with a first-order specification (size/max relations) — e.g.,
take: y.size < x1.size ∧ y.max ≤ x1.max ∧ x2>0.
Example: Conflict Detection in Partial Programs
Synthesis begins with a partial abstract syntax tree (AST):
head( take( filter( x1 , ? ) , ? ) ) with unfilled holes (?)
System composes component specifications to create a path condition Φₚ; for example:
y ≤ v1.max, v1.max ≤ v3.max, ... combined with I/O specification x1=[49,62,82,54,76] ∧ y=158
SMT solver check reveals unsatisfiable constraint (contradiction: y ≤ 82 yet y = 158)
⇒ This partial assignment represents a conflict
Example — Learning Efficiently from Conflicts
NEO identifies a Minimal Unsatisfiable Core, precisely isolating the predicates causing the conflict.
It recognizes components that are equivalent modulo conflict: replacing head or take with any operator having identical size/max specifications would still result in infeasibility.
NEO generates a powerful lemma (conflict clause) that eliminates entire families of ASTs, dramatically pruning the search space — similar to clause learning in CDCL SAT solvers but specifically optimized for DSL semantics.
Equivalence Modulo Conflict analysis determines that sort ≡ take ≡ reverse ≡ filter at node N₁, allowing NEO to learn a lemma that prunes 63 sibling ASTs in one operation.
DSL and Semantics in NEO
Context-Free Grammar
The Domain-Specific Language (DSL) is defined by a context-free grammar G that specifies the syntax of valid programs in the target domain.
This grammar determines the space of possible programs that the synthesis algorithm will explore.
Component Specifications
Each component χ in the DSL has an associated first-order specification Ψ(χ) that defines its semantics in terms of relationships between inputs and outputs.
For example, the take operation might have the specification: y.size < x.size ∧ y.max ≤ x.max
These specifications are crucial for both the Deduce and AnalyzeConflict phases of the algorithm. They allow the system to reason about program behavior and identify conflicts without executing the program, enabling efficient pruning of the search space.
Equivalence Modulo Conflict: The Key Innovation
Definition
Two components χ and χ′ are equivalent modulo conflict (χ ≡ₙ χ′) if replacing χ with χ′ at node N results in the same Minimal Unsatisfiable Core (MUC).
Verification Condition
This equivalence can be checked using the theorem: Ψ(χ′) ⇒ φ where (φ,N,χ) is in the MUC κ.
Pruning Power
When EMC is detected, NEO can learn a clause ¬c_{N,χ} ∧ ¬c_{N,χ′} that prunes multiple sibling ASTs simultaneously, dramatically reducing the search space.
This key insight allows NEO to generalize from a single conflict to identify and eliminate entire families of programs that would fail for the same fundamental reason, providing exponential pruning power compared to traditional approaches.
NEO Algorithm Loop
Decide
Pick a missing part of the program and fill it with a valid option from our knowledge base
Propagate
Use our current choices to figure out what other parts must be filled in
CheckConflict
See if our current program makes logical sense by checking it against our rules
AnalyzeConflict
If we find a problem, learn why it happened and add this lesson to our knowledge base
Backtrack
Go back to a previous decision point and try a different option
This process repeats until we either find a working program that meets all requirements, or we prove that no solution exists with our available building blocks.
Evaluation Results: NEO vs. Competitors
The evaluation results demonstrate NEO's significant advantages over previous state-of-the-art synthesizers. In data-wrangling tasks, NEO solved 90% of benchmarks compared to Morpheus's 64%, while taking only 19 seconds on average versus 68 seconds.
For list manipulation tasks, NEO achieved 71% completion versus DeepCoder's 32%, with half the average solution time. On particularly challenging cases, conflict-learning provided up to 17× speedups.
PART B: Cryptographic Function Synthesis & Optimisation
We now transition to the second major part of our lecture, focusing on the application of conflict-driven synthesis techniques to cryptographic implementations. This represents an exciting and practical application domain where correctness, performance, and security are all critical concerns.
Cryptographic kernels present unique challenges and opportunities for synthesis approaches, requiring not only functional correctness but also constant-time execution to prevent timing side-channel attacks, while still achieving high performance.
We'll explore how the conflict-driven learning approach can be adapted to this domain to produce correct-by-construction implementations that are both secure and efficient.
Why Focus on Cryptographic Implementations?
Performance Critical
Tiny cryptographic kernels like modular multiplication and S-boxes often dominate the runtime of cryptographic applications, making their optimization crucial for overall system performance.
Security Requirements
Cryptographic implementations must satisfy constant-time execution to prevent timing side-channel attacks, along with size constraints for embedded applications.
Implementation Challenges
Hand-optimization of cryptographic code is notoriously brittle and error-prone, creating a need for correct-by-construction approaches that guarantee both correctness and security.
Synthesis techniques offer a promising approach to address these challenges simultaneously, producing implementations that are provably correct, constant-time, and highly optimized for performance.
Synthesis Setup for Cryptographic Functions
Specification (Φ)
The synthesis specification must capture both functional correctness and timing-side-channel safety. For example, a modular multiplication function might be specified as: ∀x,y. 0≤x,y<2¹⁶ → mod_mult(x,y) = (x·y mod p)
Domain-Specific Language
The DSL for cryptographic implementations typically includes bitwise operations, limb arithmetic, rotate operations, add-with-carry instructions, and other low-level primitives available on the target architecture.
Cost Model
Performance is evaluated using appropriate metrics such as CPU cycles, gate count for hardware implementations, or micro-operations for specific targets like eBPF.
Conflict-driven learning is particularly well-suited to this domain because many algebraic variants of cryptographic implementations differ only in how they split operations across limbs or handle carries, creating opportunities for effective pruning through EMC.
Domain-Specific Language (DSL) for Fp² Expressions
Fp² is a quadratic extension field of the prime field Fp. In cryptography, Fp² represents complex numbers over a finite field, where elements are expressed as a + bi with a,b ∈ Fp. Fp² expressions are mathematical formulas that manipulate these field elements using operations like addition, multiplication, and squaring.
Expr ::= Var                   // a0, a1, xi, tmp…
      | Const                 // field constants
      | Add(Expr, Expr)       // + 
      | Sub(Expr, Expr)       // –
      | Sqr(Expr)             // x * x (cheap-mul)
      | Mul(Expr, Expr)       // general multiplication
      | MulConst(Const, Expr) // ξ * x (const-mul)
      | Let Var = Expr ; Expr // share sub-results
Type system. All expressions have type Fp².
Algebraic axioms (associative + commutative for Add/Sub; distributive) are fed as rewrite rules to the synthesizer.
Cost Model for Fp² Operations
Total cost = Sum of (unit cost × number of times used)
Example costs:
Basic approach: 2·Sqr + 1·Mul + 1·MulConst + 1·Add = 27
Improved version: 3·Sqr + 1·MulConst + 2·Add + 2·Sub = 26
The goal is to reduce this total cost while keeping the same results—using cheaper operations like squaring and addition instead of expensive multiplication when possible.
Example
Naive Implementation
def sq_fp4_naive(a0, a1, xi):
    c0 = a0*a0 + (a1*a1)*xi  
    c1 = 2 * a0 * a1
    return (c0, c1)
# 2 sqr + 1 mul + ξ-mul
Optimized Implementation
def sq_fp4_optimized(a0, a1, xi):
    t0 = a0*a0
    t1 = a1*a1
    c0 = t1 * xi + t0
    c1 = (a0 + a1)**2 - t0 - t1
    return (c0, c1)
# 3 sqr, no general mul
# A = a0 + a1*V where V^2 = ξ (Fp² coefficients)
When Does Crypto Synthesis Shine?
Algebraic Rich Operators
Domains with finite-field and polynomial operations that offer multiple implementation strategies with different performance characteristics
Semantic Equivalence
Scenarios with multiple semantically-equivalent sequences that differ significantly in cost, creating opportunities for optimization
Security Assurance
Applications requiring formal guarantees against side-channel attacks, overflow conditions, or other security vulnerabilities
Hardware Constraints
Implementations targeting specific hardware with unique instruction sets or performance characteristics
The combination of conflict-driven learning with e-graph equality saturation provides a powerful approach for synthesizing cryptographic implementations that are correct, secure, and highly optimized for the target platform.
Key Takeaways
1
Power of Conflict-Driven Learning
Conflict-Driven Learning uses lessons from SAT solving to make program synthesis better. It helps find solutions faster by learning from mistakes.
2
Equivalence Modulo Conflict
This new idea helps the system avoid dead ends by ruling out large parts of the search space that won't work, making synthesis much faster.
3
Broad Applicability
The NEO framework works better than specialized tools across many areas like strings, lists, and tables, showing how flexible it is.
4
Cryptographic Applications
These same methods work well for creating and improving cryptographic code, making programs that are both correct and fast.
This lecture showed a powerful way to create programs that works for everyday coding and security-critical cryptography. If you want to learn more, try using NEO with your own programming language or use it to create better cryptographic code.
Recommended Readings
Conflict-Driven Learning
Feng et al. "Program Synthesis using Conflict-Driven Learning" PLDI '18
CEGIS Foundations
Solar-Lezama PhD "Sketching" (2008) – CEGIS roots
Equality Saturation
Buss et al. "Egg: Fast and Extensible Equality Saturation" POPL '20
Cryptographic Applications
Häner et al. "Automatic optimization of secure and fast modular arithmetic" CCS '21