Fooling Near-Maximal Decision Trees

By William M. Hoza and Zelin Lv

Read the paper: ECCC

Abstract (for specialists)

For any constant $\alpha > 0$, we construct an explicit pseudorandom generator (PRG) that fools $n$-variate decision trees of size $m$ with error $\varepsilon$ and seed length $(1 + \alpha) \cdot \log_2 m + O(\log(1/\varepsilon) + \log \log n)$. For context, one can achieve seed length $(2 + o(1)) \cdot \log_2 m + O(\log(1/\varepsilon) + \log \log n)$ using well-known constructions and analyses of small-bias distributions, but such a seed length is trivial when $m \geq 2^{n/2}$. Our approach is to develop a new variant of the classic concept of almost $k$-wise independence, which might be of independent interest. We say that a distribution $X$ over $\{0, 1\}^n$ is $k$-wise $\epsilon$-probably uniform if every Boolean function $f$ that depends on only $k$ variables satisfies $\mathbb{E}[f(X)] \geq (1 - \epsilon) \cdot \mathbb{E}[f]$. We show how to sample a $k$-wise $\epsilon$-probably uniform distribution using a seed of length $(1 + \alpha) \cdot k + O(\log(1/\epsilon) + \log \log n)$.

Meanwhile, we also show how to construct a set $H \subseteq \mathbb{F}_2^n$ such that every feasible system of $k$ linear equations in $n$ variables over $\mathbb{F}_2$ has a solution in $H$. The cardinality of $H$ and the time complexity of enumerating $H$ are at most $2^{k + o(k) + \mathrm{polylog}(n)}$, whereas small-bias distributions would give a bound of $2^{2k + O(\log(n/k))}$.

By combining our new constructions with work by Chen and Kabanets (TCS 2016), we obtain nontrivial PRGs and hitting sets for linear-size Boolean circuits. Specifically, we get an explicit PRG with seed length $(1 - \Omega(1)) \cdot n$ that fools circuits of size $2.99 \cdot n$ over the $U_2$ basis, and we get a hitting set with time complexity $2^{(1 - \Omega(1)) \cdot n}$ for circuits of size $2.49 \cdot n$ over the $B_2$ basis.

Not-so-abstract (for curious outsiders)

⚠️ This summary might gloss over some important details.

A "pseudorandom generator" is an algorithm that flips a coin a few times and outputs a long sequence of bits that "appear random" in some sense. In this paper, we design a pseudorandom generator that outputs a sequence of bits that appear random from the perspective of any observer who only looks at $k$ of the bits. The challenge is that we have to generate all the bits without knowing which $k$ bits the observer will choose to look at. In fact, even the observer doesn't necessarily know in advance which $k$ bits they will look at, because they are allowed to take into account the bits they've seen so far when they are deciding which bit to look at next. Before our work, there were known pseudorandom generators that use approximately $2k$ coin flips to output such a sequence. Our new pseudorandom generator only uses approximately $1.01k$ coin flips.

I posted a single-author manuscript online in January 2025. Zelin and I posted an updated, jointly-authored manuscript in July 2025. The paper will appear at RANDOM in August 2025.

Expository material:

[Slides pptx] [Slides pdf]. These are slides from my presentation at the UChicago CS Theory Lunch (January 2025).