A Simpler Proof-Irrelevant Tiny Object

I had some more thoughts about my previous post. Here's a sketch of a little type theory with one built-in type $\T$, and two modalities $\surd$ and $\blat$. It's simpler than the last post in two ways: first, that we only need to work in one language and not two, and second, that there are way fewer substitution principles involved, and so fewer lemmas to check about substitutions being compatible with one another. We push a bit more complexity into the structure of contexts, but I think it's a net win.

Language

Here's the language: \[\begin{array}{rlcl} \textrm{Contexts} & \Gamma &\cc=& \cdot \celse \emptyset \celse \Gamma, x : A \celse \Gamma, x :: A\celse \Gamma, x \div A \celse \Gamma, \star \\ \textrm{Types} & A &\cc=& \cdots \celse \T \celse \surd A \celse \blat A \\ \textrm{Terms} & M &\cc=& \cdots \celse \star \celse \blet \star = M \bin N \celse \mathbf{in}_\surd M \celse \mathbf{el}_\surd M \celse \mathbf{in}_\blat M \celse \mathbf{el}_\blat M \end{array}\] A context can be empty $\cdot$ or it can be intrinsically contradictory $\emptyset$, and can be extended invarious ways. There are normal hypotheses $x : A$, and both strong $x :: A$ and weak $x \div A$ modal hypotheses. This notation is inspired by [Pfenning '01], though don't take it too literally; in my case it's not the $\div$-hypotheses where proof-irrelevance shows up.

There's also the possibility of adding a marker $\star$ to a context. This is in fact where the proof irrelevance shows up! The presence or absence of $\star$ in a context is all that matters, and that's why I don't even give it a variable name. It's merely a marker: I say that $\Gamma, \star$ is definitionally equal to $\Gamma, \star, \star$, and also we understand there to be an exchange property, that $\star$ is mobile and can commute around other hypotheses in $\Gamma$ without any discernible difference.

The Dream

With this vocabulary at least named, I can say what my motivation is. Perhaps with the right apparatus I can, internally to this type theory, prove things that feel like parametricity theorems. In particular, one thing I'm working up to convincing myself of is that there is, for any types $A$ and $B$, an isomorphism between \[ \surd(A \to B) \] and the dependent record type \[ \begin{array}{l} \{\\ \qquad f : (\T \to A) \to (\T \to B),\\ \qquad g : \surd A \to \surd B,\\ \qquad p : (a : \surd A) \to \lambda t . \mathbf{el}_\surd(g\ a) \equiv f\ (\lambda t.\mathbf{el}_\surd\ a)\\ \} \end{array} \] The intuition I have for this isomorphism is that having a term of type $\surd(A \to B)$ is like having a nice, closed, parametric term of type $A \to B$. The dependent record type is then an unpacking of what this is supposed to mean: $f$ is the underlying function. The type operator $\T \to \hbox{---}$ is, for reasons that I haven't made clear yet, interpreted as taking a "type that comes equipped with a parametricity relation" and yielding the underlying type. The function $g$ is the data of the parametricity property of the function: it says that if we have a "nice" $A$, i.e. an element of $\surd A$, then we map it to a nice $B$. The equation $p$ then says that $f$ and $g$ are compatible: $g$'s mapping of nice $A$s to nice $B$s actually correspond's to $f$'s mapping of arbitrary underlying $A$s to arbitrary underlying $B$s.

Thus the dream is that this idea could not only be achieved but extended to the ability to internally reason about types like \[ \surd((X : \rtype) \to X \to X \] \[ \surd((X : \rtype) \to (X \to X) \to X \to X) \] etc. and say something interesting about how they must contain identity functions or church numerals or the like, perhaps under additional assumptions about the types they are applied to.

Context Operations

We have a couple of operations that take a context and yield a context: $\Gamma^\ominus$ demotes all $x : A$ to merely $x \div A$, and $\Gamma^\oplus$ promotes all $x : A$ to $x :: A$. Demotion deletes $\star$, and promotion yields a contradictory context if it encounters a $\star$.

$(\Gamma, x : A)^\ominus = (\Gamma^\ominus, x \div A)$
$(\Gamma, x :: A)^\ominus = (\Gamma^\ominus, x :: A)$
$(\Gamma, x \div A)^\ominus = (\Gamma^\ominus, x \div A)$
$(\Gamma, \star)^\ominus = \Gamma^\ominus$

$(\Gamma, x : A)^\oplus = (\Gamma^\oplus, x :: A)$
$(\Gamma, x :: A)^\oplus = (\Gamma^\oplus, x :: A)$
$(\Gamma, x \div A)^\oplus = (\Gamma^\oplus, x \div A)$
$(\Gamma, \star)^\oplus = \emptyset$

Context Formation

\[ {\prov \Gamma \rctx \qquad \Gamma \prov A : \rtype \over \prov (\Gamma, x : A)\rctx} \qquad {\prov \Gamma \rctx \qquad \Gamma^\ominus \prov A : \rtype \over \prov (\Gamma, x :: A)\rctx} \qquad {\prov \Gamma \rctx \qquad \Gamma, \star \prov A : \rtype \over \prov (\Gamma, x \div A)\rctx} \]

Variable Typing

When can we use a variable? Here are the rules. \[ {x :A \in \Gamma \over \Gamma \prov x : A} \qquad {x :: A \in \Gamma \over \Gamma \prov x : A} \qquad {x \div A \in \Gamma \qquad \star \in \Gamma \over \Gamma \prov x : A} \] Normal hypotheses can be used normally. Strong hypotheses $x :: A$ can also be used normally; the way in which their strength is visible is not directly through the variable rule, but rather by how they interact with the context operation $\Gamma^\ominus$: strong hypotheses are resilient to being hit by $\Gamma^\ominus$, whereas normal hypotheses get demoted. Weak hypotheses can only be used if there is also a $\star$ marker currently in the context. Considering terminology of prior work with locks and keys, the $\star$ can be thought of as the "key" that "unlocks" $x \div A$, but we don't include any trace of this fact in the term language. I think that we can get away with this only because of the proof-irrelevance of the tiny object we're working with.

Term Typing for Tiny Object

Now we can give the intro and elim rules for the tiny type $\T$, which internalizes $\star$. \[ {\star \in \Gamma \over \Gamma \prov \star : \T} \qquad { \Gamma \prov M : \T \qquad \Gamma, \star \prov N : A \over \Gamma \prov \blet \star = M \bin N : A} \] We can introduce $\T$ whenever $\star$ is in the context, and if we have a term of type $\T$, then we can eliminate it and build the remaining continuation assuming that we have $\star$ in the context. We have $\beta$ and $\eta$ equalities like so: \[\blet \star = \star \bin M \mapsto_\beta M \qquad (M : \T) \mapsto_\eta \blet \star = M \bin \star\]

Term Typing for $\surd$

\[ { \Gamma^\ominus \prov M : A \over \Gamma \prov \mathbf{in}_\surd\ M : \surd A } \qquad { \Gamma, \star \prov M : \surd A \over \Gamma \prov \mathbf{el}_\surd\ M : A } \] Somehow $\surd$ is kind of like a modal $\square$: $\surd A$ is stronger than $A$. The intro rule for $\surd$ is unlike a Pfenning-Davies $\square$, though, in that it doesn't delete every nonmodal hypothesis in the context on the way up, but it does demote them. Then in the elimination rule, because $\surd A$ is stronger than $A$, we grant a favor to the prover trying to eliminate from a $\surd$: if you can prove $\surd A$ even in a more permissive context that comes with a key (i.e. includes $\star$), you're allowed to conclude $A$.

These rules witness that $\T \to \dash \dashv \surd$: Applying $\dash^\ominus$ to the context is like hitting it with $\T \to \dash$, and the elimination rule is morally the counit $(\T \to \surd A) \to A$.

We have $\beta$ and $\eta$ equalities like so: \[\mathbf{el}_\surd\ \mathbf{in}_\surd\ M \mapsto_\beta M \qquad M \mapsto_\eta \mathbf{in}_\surd\ \mathbf{el}_\surd\ M\]

Term Typing for $\blat$

\[ { \Gamma^\oplus \prov M : A \over \Gamma \prov \mathbf{in}_\blat\ M : \blat A } \qquad { \Gamma^\ominus \prov M : \blat A \over \Gamma \prov \mathbf{el}_\blat\ M : A } \] The modality $\blat$ is a monad. $\blat A$ is weaker than $A$. So in the introduction rule, we give the prover more power. If you can prove $A$ in the promoted context $\Gamma^\oplus$, then that's good enough for $\blat A$. Conversely, we take away some power when eliminating. If you can prove the easier goal $\blat A$ in the more challenging environment $\Gamma^\ominus$, then that's good enough to establish $A$.

These rules witness that $ \surd \dashv \blat$: Applying $\dash^\oplus$ to the context is like hitting it with $\surd$, and the elimination rule is morally the counit $( \surd \blat A) \to A$.

We have $\beta$ and $\eta$ equalities like so: \[\mathbf{el}_\blat\ \mathbf{in}_\blat\ M \mapsto_\beta M \qquad M \mapsto_\eta \mathbf{in}_\blat\ \mathbf{el}_\blat\ M\]

Steps Toward the Dream

Easy Direction

The easier direction is taking some function $h : \surd(A \to B)$ and trying to hack up the dependent record type I mentioned earlier. \[ \begin{array}{l} f : (\T \to A) \to (\T \to B)\\ f\ k\ t\ = (\mathbf{el}_\surd\ h)\ (k\ t)\\ \\ g : \surd A \to \surd B\\ g\ a = \mathbf{in}_\surd\ ((\mathbf{el}_\surd\ h)\ (\mathbf{el}_\surd\ a)) \\ \end{array} \] and then if I focus on the equation I need for \[ p : (a : \surd A) \to \lambda t . \mathbf{el}_\surd(g\ a) \equiv f\ (\lambda t.\mathbf{el}_\surd\ a)\\ \] I can see that by function extensionality it's \[ a : \surd A, t : \T \prov \mathbf{el}_\surd(g\ a) \equiv f\ (\lambda t.\mathbf{el}_\surd\ a)\ t\\ \] and then after substituting the definition of $f$ and $g$ I just gave, \[ a : \surd A, t : \T \prov \mathbf{el}_\surd(\mathbf{in}_\surd\ ((\mathbf{el}_\surd\ h)\ (\mathbf{el}_\surd\ a))) \equiv (\mathbf{el}_\surd\ h)\ ((\lambda t.\mathbf{el}_\surd\ a)\ t) \\ \] but by a $\sqrt\beta$ and an ordinary $\lambda\beta$ reduction, this is \[ a : \surd A, t : \T \prov (\mathbf{el}_\surd\ h)\ (\mathbf{el}_\surd\ a) \equiv (\mathbf{el}_\surd\ h)\ (\mathbf{el}_\surd\ a) \\ \] which is reflexivity.

Hard Direction

The harder direction is showing that the dependent record type is sufficient to reconstruct a $\surd (A \to B)$, not to mention the obligation of showing these two directions are mutually inverse, which I'm not going to attempt yet. For this I think I need a new rule or principle or something. Conjecturally, I like the following idea a lot: \[ \Gamma, \star \prov M : B\qquad \Gamma^\oplus \prov N : B \qquad \Gamma^\oplus, \star \prov M \equiv N : B \over \Gamma \prov \langle\!\langle M, N\rangle\!\rangle : B \] This is saying in one sense that: If you can build a term $M$ with a bit of extra help (namely $\star$) and you can also build a term $N$ with a different bit of extra help (namely that the context has been promoted), and if those terms are really the same term if you weaken both of them to include both kinds of help, then we should be able to make a real term by pairing them together.

Examining the semantics that I haven't described yet, this is also saying that if I can build a term at "the underlying level" (under an assumption of $\star$), and "at the parametricity relation level" (under a promoted context), and the two are compatible, then I get a real full term of type $B$. This is a way of gluing together a pair of morphisms that form a commutative square into a genuine morphism in the presheaf category.

But it's obviously problematic to just throw in a new term constructor, because now I want to say how it should compute, and I'm not sure yet how to do that. Nonetheless, if I have something like this, I can imagine starting with the problem \[ f : (\T \to A) \to (\T \to B), g: \surd A \to \surd B, p : \cdots \prov \textbf{???} : \surd(A \to B)\] and applying $\surd$-introduction and $\to$-introduction to reduce my problem to \[ f \div (\T \to A) \to (\T \to B), g \div \surd A \to \surd B, p \div \cdots, a : A \prov \textbf{???} : B\] and now here I can use this conjectural pairing principle I just postulated, and split into three goals: \[ f \div (\T \to A) \to (\T \to B), a : A, \star \prov M\textbf{?} : B\] \[ g \div \surd A \to \surd B, a :: A \prov N\textbf{?} : B\] \[ f \div (\T \to A) \to (\T \to B), g \div \surd A \to \surd B, a :: A \prov M \equiv N : B\] But now these $M$ and $N$ are easily filled in! I say: \[ f \div (\T \to A) \to (\T \to B), a : A, \star \prov f\ (\lambda t.a)\ \star : B\] \[ g \div \surd A \to \surd B, a :: A \prov \mathbf{el}_\surd\ (g\ (\mathbf{in}_\surd\ a)) : B\] and now I want to check \[ f \div (\T \to A) \to (\T \to B), g \div \surd A \to \surd B, p : \cdots, a :: A \prov \] \[f\ (\lambda t.a)\ \star \equiv \mathbf{el}_\surd\ (g\ (\mathbf{in}_\surd\ a)) : B\] But recall that \[p : (a : \surd A) \to \lambda t . \mathbf{el}_\surd(g\ a) \equiv f\ (\lambda t.\mathbf{el}_\surd\ a)\] so let's apply it to $\mathbf{in}_\surd\ a$ and use congruence along applying to $\star$ (both of which are well-typed in the extended context we're working in to obtain the equation, which has both $\star$ and a strong $a :: A$) to get \[ \mathbf{el}_\surd(g\ (\mathbf{in}_\surd\ a)) \equiv f\ (\lambda t.\mathbf{el}_\surd\ (\mathbf{in}_\surd\ a))\ \star\] We do one $\beta$-reduction to get \[ \mathbf{el}_\surd(g\ (\mathbf{in}_\surd\ a)) \equiv f\ (\lambda t. a)\ \star\] and this is now exactly the equation we wanted!

Semantics

A lot of things here probably seem super mysterious and pulled out of nowhere, and this is partly because I've leaned on a certain interpretation of this syntax to guide my guesses.

In one sentence, I'd say this syntax has a "syntactic semantics" in ordinary type theory following, as a guide, the semantics of type theory in the presheaf category over the one-arrow category $\bullet \to \bullet$.

My preferred notation for this is that we define operations taking the syntax of our language here down to ordinary dependent type theory without $x :: A$, $x \div A$, $\star$, $\T$, $\surd$, $\blat$, etc. $\Gamma^*$ takes a context to a ordinary context. $A^*$ takes a type to a ordinary type. $M^*$ takes a term to a ordinary term. $\sem \Gamma$ takes a context to a ordinary context. $\sem A(M)$ takes a type and an ordinary term to a ordinary type. $\sem M$ takes a term to a ordinary term.

Generally speaking, $\dash^*$ is boring and does "what you expect" on familiar things, but its behavior on newly-introduced type operators is still interesting and relevant. The operation $\sem\dash$ captures parametricity properties. Under some circumstances variables $x$ translate to two associated variables $x^*$ and $\sem x$.

$(\Gamma, x : A)^* = (\Gamma^*, x : A^*)$
$(\Gamma, x :: A)^* = (\Gamma^*, x^* : A, \sem x : \sem A(x^*))$
$(\Gamma, x \div A)^* = (\Gamma^*, x : A^*)$
$(\Gamma, \star)^* = (\Gamma^*, \star : 1)$

$\sem {\Gamma, x : A} = (\sem\Gamma, x^* : A, \sem x : \sem A(x^*))$
$\sem {\Gamma, x :: A} = (\sem\Gamma, x^* : A, \sem x : \sem A(x^*))$
$\sem{\Gamma, x \div A} = (\sem\Gamma, x : A^*)$
$\sem{\Gamma, \star} = (\Gamma^*, \star : 1, ☠ : 0)$

Semantics of function types and term constructors goes like so:

$((x : A) \to B)^* = (x^* : A^*) \to B^*$
$(\lambda x .M)^* = \lambda x^* . M^*$
$(M\ N)^* = M*\ N^*$
$\sem {(x : A) \to B}(f^*) = (x^* : A)(\sem x : \sem A(x^*)) \to \sem B(f^* x^*) $
$\sem {\lambda x . M} = \lambda x^*.\lambda \sem x . \sem M $
$\sem {M\ N} = \sem M\ N^*\ \sem N$

Semantics of $\T$ goes like so: (assuming we have a unit type $1$ with unit value named $\langle\rangle$)

$\T^* = 1$
$(\star)^* = \langle\rangle$
$(\blet M = \star \bin N)^* = [\langle\rangle/\star]N$
$\sem {\T}(\_) = 0 $
$\sem {\star} = \babort(☠) $
$\sem{\blet \star = M \bin N} = \babort(\sem M )$

Semantics of $\surd$ goes like so: (assuming values of $\Sigma$ types are constructed with $\langle\dash,\dash\rangle$ and projected with $\dash.1$ and $\dash.2$)

$(\surd A)^* = (a^* : A^*) \x \sem A(a^*)$
$(\mathbf{in}_\surd\ M)^* = \langle M^*, \sem M \rangle$
$(\mathbf{el}_\surd\ M)^* = (M^*).1$
$\sem {\surd A}(\_) = 1 $
$\sem {\mathbf{in}_\surd\ M} =\langle\rangle $
$\sem{\mathbf{el}_\surd\ M} = (M^*).2$

Semantics of $\blat$ goes like so: (assuming values of $\Sigma$ types are constructed with $\langle\dash,\dash\rangle$ and projected with $\dash.1$ and $\dash.2$)

$(\blat A)^* = 1$
$(\mathbf{in}_\blat\ M)^* = \langle\rangle$
$(\mathbf{el}_\blat\ M)^* = \sem M.1$
$\sem {\blat A}(\_) = (a^* : A^*) \x \sem A(a^*) $
$\sem {\mathbf{in}_\blat\ M} = \langle M^*, \sem M \rangle $
$\sem{\mathbf{el}_\blat\ M} = \sem M.2$

Semantics of the conjectural pairing operator goes like so:

$(\langle\!\langle M, N\rangle\!\rangle)^* = M^*$
$\sem{\langle\!\langle M, N\rangle\!\rangle} = \sem N$

Remaining Problems

The big things I'd like to know how to deal with or fix are:

I feel based on the semantics that it should be provable that $\blat 0 \dashv\vdash \T$. But I can't seem to prove this with the current set of rules! Maybe $\T$ or $\blat$'s rules are not the ones I want. More generally I think $\blat A$ should be a pushout of $A \leftarrow H \x A \to H$ and I can't prove this either.
I actually don't know if $\blat$ plays an important role for my needs? I wrote it down because I can, and it's part of the chain of adjunctions, but it seems like more of the game is in $\surd$.
I don't know how pairing should compute.
I don't know what I should be proving about the universe for parametricity purposes: I definitely think $\surd \rtype$ should be isomorphic to something useful, and I have a hunch it might be \[\surd \rtype \cong (A : \T \to \rtype) \x (t : \T) (B : A\ t) \to \rtype \] but I get lost in formal details when I try to investigate this further.