I ran into a hiccup the other week with my work on finding a good definition of schemas. After mulling over the problem a bit, I realized a part of why I even had it to begin with is that even though I knew what I ultimately wanted schemas to let me do, I didn't have a clear idea of what a schema should be other than "a generalization of a set". I more or less scrapped what I had been doing up to that point and got to task with figuring out a clear answer to this question, since I didn't really feel I could properly move forward without having a solid idea of the end-goal, so to speak.
Since my original motivation for this whole thing came from certain issues with dealing with the hyperreals (and, more generally, objects built out of the ultrapower construction) I decided that would be a good place to start. If you don't know about the ultrapower construction you don't really need to know a lot except this (which I'll frame specifically in terms of the hyperreals): the construction introduces a distinction between *internal* subsets and *external* subsets of \*R. An important fact of the hyperreals is that any infinite internal subset of \*R has infinitesimal and/or infinite elements. As a result, even though N, Z, Q, and R can all be embedded in the reals in a natural way, for the purposes of reasoning about the internal properties of the hyperreals we should use the hyperextensions of each of these. E.g., the hyperextension of N is \*N, called the *hypernaturals*, which contains elements that are infinite in the sense that they are larger than any standard natural.
The internal subsets of \*R are given by \*P(R), the ultrapower of the powerset of the reals. In turn, we can talk about internal *collections* of internal subsets of \*R—this corresponds to \*P²(R), the ultrapower of the powerset of P(R). This continues ad infinitum.
How I've taken to interpret this is that \*P(R) is the "correct setting" for second-order logic, not P(\*R). Likewise, \*P²(R) is the "correct setting" for third-order logic on \*R rather than P³(\*R), and so forth. To extend this to schemas, we should think of them as a setting for a restricted form of higher-order logic on some set. This idea isn't entirely novel (there's something called Henken semantics for HOL which seems to be similar), but I believe the way I need to wrap it up in an object to achieve my eventual goal is a new idea.
This has led me to my definition of a *concrete schema complex* on a set, which is I believe how any "good" definition of a schema should be realized (at least in some cases) when we're considering a way to put this structure on an actual set. A concrete schema complex on a set X is a sequence of sets ∆ = (∆(0) = X, ∆(1), ∆(2), ∆(3), ...) along with embeddings ∆(k) → ∆(k+1) and embeddings ∆(k) → P^(k)(X). Moreover, when we include the natural embeddings P^(k)(X) → P^(k+1)(X) where P^(0)(X) = X, there is an obvious (infinite) diagram and we impose that this diagram must commute. Lastly, with respect to the language {\exists, \forall, \vee, \wedge, \neg, \sup, \inf} where \sup and \inf are the *binary* operators on the lattice structure of these sets, we require that ∆(k) and P^(k)(X) are elementarily equivalent (which I think means that the map ∆(k) → P^(k)(X) is an elementary embedding?) Intuitively, this says that a schema complex on X is a collection of substructures of the (iterates) powersets of X which behave just like powersets as long as we restrict ourselves to first-order sentences in the appropriate language.
I've already begun working to abstract this definition, and I believe I'm making good progress though I'm far from finished as of yet. I'll probably end up dropping "complex" from the name and just refer to the whole collection of data as a schema. Originally I had planned on a schema as being a much smaller collection of data, but some concerns led me to the larger collection in using now and that's when "schema complex" got started. Now that I've zeroed in on this "restricted HOL" idea it feels like the larger collection of data is the right choice and now having "schema" and "schema complex" be distinct things is a bit pointless.