[On the phone] There's a man here with some sort of a parasite on his arm, assimilating his flesh at a frightening speed. I may have to get ahead of it and amputate. No... I don't know what it is or where it came from.I took some tolerably advanced math courses in college and graduate school. My graduate research group of choice was the Theory Umbrella Group (THUG), joint between the math and computer science departments. But one thing I never encountered in any of those courses, nor that-I-recall even in the THUG talks, was a type. Sets aplenty, but not types. Types seem to arise from specifically studying computer science, mathematics proper having no native interest in them. There are the "types" in Russell and Whitehead's Principia Mathematica, but those don't seem to me to have anything really to do with types as experienced in programming.
— Dr. Hallen, The Blob
Yet, over in the computer science department, we're awash in types. They're certainly used for reasoning about programs (both practically and theoreticially) — but at some point our reasoning may become more about the types themselves than about the programs they apply to. Type systems can be strikingly reminiscent of bureaucratic red tape when one is getting tangled up in them. So, if they aren't a natively mathematical concept, why are they involved in our reasoning at all? Are they natural to what we're reasoning about (programs), or an unfortunate historical artifact? From the other side, is reasoning in mathematics simpler because it doesn't use types, or does it not need to use types because what it's reasoning about is simpler?
Representation format
Looking back at the early history of programming, types evidently arose from the need to keep track of what format was being used by a given block of binary data. If a storage cell was assigned a value using a floating-point numerical representation, and you're trying to treat it as a series of ASCII characters, that's probably because you've lost track of what you meant to be doing. So we associate format information with each such cell. Note that we are not, at this point, dealing directly with the abstract entities of mathematics, but with sequences of storage bits, typically fixed-width sequences at that. Nor does the type even tell us about a sort of mathematical entity that is being stored, because within the worldview presented by our programming language, we aren't storing a mathematical entity, we're representing a data value. Data values are more abstract than bit sequences, but a lot less abstract than the creatures we'd meet in the math department. The essential difference, I'd suggest, is that unlike their mathematical cousins, data values carry about with them some of their own representation format, in this case bit-level representation format.
A typical further development in typing is user-defined (which is to say, programmer-defined) types. Each such type is still stored in a sequence of storage bits, and still tells us how the storage is being used to represent a data value, rather than store a mathematical entity. There is a significant difference from the earlier form of typing, in that the language will (almost certainly) support a practically infinite number of possible user-defined types, so that the types themselves have somewhat the character of mathematical abstract entities, rather than data values (let alone bit sequences). If, in fact, mathematics gets much of its character by dealing with its abstract entities unfettered by representational issues (mathematics would deal with representation itself as just another abstract domain), a computer scientist who wants that character will prefer to reason as much as possible about types rather than about data values or storage cells.
Another possible development in typing, orthogonal to user-defined types, is representation-independence, so that the values constrained by types are understood as mathematical entities rather than data values. The classic example is type bignum, whose values are conceptually mathematical integers. Emphasis on runtime efficiency tends to heighten awareness of representational issues, so one expects an inverse relation between that emphasis, and likelihood of representation-independent types. It's not a coincidence that bignums flourish in Lisp. Note also that a key twist in the statement of the expression problem is the phrase "without recompiling existing code".
Complicated type systems as crutches
Once we have types, since we're accustomed to thinking about programs, we tend to want to endow our type systems with other properties we know from our programming models. Parametric types. Dependent types. Ultimately, first-class types.
I've felt the lure of first-class types myself, because they abandon the pretense that complicated types systems aren't treating types computationally. There's an incomplete language design in my files wherein a type is an object with two methods, one for determining membership and one for determining sub/supertyping. That way leads to unbounded complications — the same train of thought has led me more recently to consider tampering with incompleteness of the continuum (cf. Section 8.4.2 of my dissertation; yet another potential blog typic). As soon as I envisioned that type system, I could see it was opening the door to a vast world of bizarre tricks that I absolutely didn't want. I really wanted my types to behave as mathematical sets, with stable membership and transitive subtyping — and if that's what you want, you probably shouldn't try to get there by first giving the methods Turing power and then backing off from it.
But, from the above, I submit that these complicated type systems are incited, to begin with, when we start down the slippery slope by
- tangling with data values —halfway between the abstract and concrete worlds— instead of abstract mathematical entities, and
- placing undue emphasis on types, rather than the things they describe. This we did in the first place, remember, because types were more nearly mathematical; the irony of that is fairly intense.
Where next?
What we need, it seems, is the lightweight character of mathematical reasoning. There's more to it than mathematical "purity"; Haskell is fairly pure, but tbh I find it appallingly heavy. I find no sense of working with simple primitives — it feels to me more like working on a scaffold over an abyss. In mathematics, there may be several different views of things any one of which could be used as a foundation from which to build the others. That's essentially perfect abstraction, in that from any one of these levels, you not only get to ignore what's under the hood, but you can't even tell whether there is anything under the hood. Going from one level to the next leaves no residue of unhidden details: you could build B from A, C from B, and A from C, and you've really gotten back to A, not some flawed approximation of it that's either more complicated than the original, more brittle than the original, or both.
Making that happen in a language design should involve some subtle shifts in the way data is conceptualized. That isn't a digression in a discussion of types, because the way we conceptualize data has deep, not to say insidious, effects on the nature of typing. As for the types themselves, I suggest we abandon the whole notion of types in favor of a lightweight mathematical notion of sets — and avoid using the word "type" as it naturally drags us back toward the conceptual morass of type theory that we need to escape.
I'm not sure why it isn't possible to allow "first-class" types, do reasoning with them, and let the type checker barf when it can't decide whether an expression is well-typed. If we want the type system to reason about complex program properties then we have to live with some amount of type-level nudging by the programmer.
ReplyDeleteWith the sort of first-class types I was describing, the type checker is the program — type-checking is just another form of general computation. Not deciding whether an expression is well-typed means type-checking nonterminates, which can't be decided in general in finite time so it's not barfing but hanging. And it's possible that the determination of well-typing will have bugs in it, introduced either accidentally or deliberately. It isn't even necessary for a deliberate anomaly in the typing to be malicious; the programmer may have thought it a clever solution to some problem or other.
ReplyDeleteI think you missed a few type for math: imaginary numbers composed of a real and imaginary component are certainly a type, you could consider that the base type of math is the digit 1, all other numbers can be described as either a 1, the absence of 1(0), a composite of 1s (2 = 1 + 1), 1s with additional characteristics (negative, positive). I'm sure I could continue but the point is that math has been around for so long that the "types" are no longer seen as composites but as "first-class" types. Types, in programming languages, come about because the world is based on types, when was the last time you had to directly manipulate a quark to get a glass of water?
ReplyDeleteI didn't miss those things. I've worked, to some nontrivial extent or other, with complex numbers, quaternions, vectors, matrices, matroids, graphs, and categories (that I recall, off hand). I simply maintain that those are sets rather than types, with the qualitative differences that sets are abstract while types (as met in actual programming) are in the twilight between abstract and concrete, and sets are lightweight while types are comparatively heavy (weighed down, in a curiously apt metaphor, by all that concrete).
ReplyDeleteNote that you could construct complex numbers as pairs of real numbers — but you don't have to. You could treat complex numbers as primitive in their own right, and safely assume that there is nothing at all under the hood, unless it's convenient for what you're doing to assume that there is something under the hood after all. That level of clean abstraction generally isn't to be found in our programming languages.
010101000111100101110000011001010111001100100000011100000111001001101111011000100110000101100010011011000111100100100000011000110110111101101101011001010010000001100110011100100110111101101101001000000111010001101000011001010010000001100110011000010110001101110100001000000111010001101000011000010111010000100000011000110110111101101101011100000111010101110100011001010111001001110011001000000111010101101110011001000110010101110010011100110111010001100001011011100110010000100000001100010010000001100001011011100110010000100000001100000010000001101001011011100010000001100001001000000111000001100001011101000111010001100101011100100110111000100000011000010110111001100100001000000111011101100101001000000110111001100101011001010110010000100000011000010010000001110111011000010111100100100000011101000110111100100000011010010110010001100101011011100111010001101001011001100111100100100000011101110110100001101001011000110110100000100000011001110111001001101111011101010111000001101001011011100110011100100000011011110110011000100000001100010010011101110011001000000110000101101110011001000010000000110000001001110111001100100000011110010110111101110101001000000110000101110010011001010010000001110010011001010110000101101100011011000111100100100000011011000110111101101111011010110110100101101110011001110010000001100001011101000010000001100110011100100110111101101101001000000111010001101000011001010010000001101000011000010111001001100100011101110110000101110010011001010010011101110011001000000111000001100101011100100111001101110000011001010110001101110100011010010111011001100101
ReplyDeleteThis is what the hardware sees at the circuit level, it is on or off in patterns of powers of 2. To me the abstract concepts of types has to do with defining what do we want these 2 patterns to represent to us humans.
Haven't you heard about generic interfaces? Well those are types, but they don't define the representation. That is, they are abstract, and that how we express "set" in a a language where "set" is not a first class thing. Then you will have subtypes for ordered sets, vectos, matrices and so on. Where is the flaw to this approach? Is it conceptually flawed, are there problems with the implementation, or is it a matter of language selection (You are using LISP, right?).
ReplyDeleteI have thought in the past that we need an algebra for classes (rather than types), so we can express differently: "All the roses are red" and "The roses can only be red", where in the first I talk about a common attribute of all the (current) objects of a class (rose), and in the second I talk about an attribute of the class itself [more on that later], so it is not a "coincidence" that we only have red roses in our reference set.
Take the Syllogism:
- All men are mortal
- All Greeks are men
----------------------
- All Greeks are mortal
Is it a coincidence* that all greeks are mortal?
*: a property of the given situation, that is that it is derived from known parameters. To word it in another way: is the predicate "All Greeks are mortal" a tautology or does it depend on parameters?
I don't know if I'm expressing myself well enough... you can define the operation "is instance of", that for example, given a object and a class it will evaluate to true or false. So, we can carry that to first order logic, and express "For all" (as in: those in a set) and "For all possible" (as in: those of a class, those in the set of all possible objects of a class).
Then again in the Syllogism, I'm I talking of the set of all known Greeks, or I'm talking about Greeks as a class? The same goes for men.
I know that when I say "Those of a class" it can be confusing, it means to address all the possible objects the class can have (being they a finite or infinite set), then those objects with which I'm working with are just a subset of that.
I know, you must be thinking: "but it is just another set". Well, yes the class "has" a set of all the possible objects, but it adds its relationships to others classes, and that is a graph you can reason about (I know that that is also a set... but math is just a literary field if you want to take the point of view of symbol manupalion, if you want to see set, you have sets). Think about it, you can carry on to define inheritance, association and attributes. In these aspects it will be similar to relational algebra but freed from the concept of "primary key" and "foreign key" for a more object oriented perspective, with the addition of inheritance, and why not: generics. Why to free it from those? because those are implementation details, they are about the representation.
Maybe, I'm talking about an extension to relational algebra... if that is what you want to see.
I wonder how far is it plausible to take this concept, I would like to develop it as a theory [or a cool toy... or find out if it has been done] but I got dead lines to meet, you know?
I think you are saying, could be wrong here, that a set can contain heterogeneous elements. This would allow the construction of numbers with attributes. Using as an example integer numbers being a magnitude combined with a sign attribute or a ‘set’ of things that represent a number. This is analogous to a ‘struct’ kind of entity where you have combined data without an associated active element. The arithmetic domain language describes the process of adding these numbers as A + B = C. But this is just hiding the messy bits that are happening underneath that + operator. The actual operations are more like:
ReplyDeleteIf sign of B is positive and sign of A is positive then C is a positive sign combined with the sum of the magnitudes of A and B
If sign of B is positive and sign of A is negative and magnitude of A is greater than magnitude of B then C is a negative sign combined with the difference of the magnitudes of A and B.
If sign of B is positive and sign of A is negative and magnitude of B is greater than magnitude of A then C is a positive sign combined with the difference of the magnitudes of A and B.
There are more rules like this but the point is that the rules are not implicit with the set. You are taught the rules at some point and then do not think of them anymore leading to the assumption that you are dealing with a base type. The same thing happens with programming languages that allow operator overloading. You can create a type, or class, called complex that consists of two real numbers, floats, and define what happens when using the + operator all within the context of the complex type. Once you have this type, adding two complex types is as simple as A + B. In general, people using the complex type are going to forget about the type being a composite thing and use it as if it was a primary type.
The ability to overload operators and/or provide ‘clean’ interactions between types has been available since at least 1979 with the development of the C++ programming language. There are other languages that also include these features that probably pre-date C++ but I am most familiar with C++. If you are using a programming language that does not provide these features then you may want to reconsider your language choice. We are always trying to develop better domain languages, after all I do not know anybody who wants to do complex math with Roman numerals.
I am fascinated and intrigued by your post, John. So, what you write in this post could lead us to a new programming paradigm? How do you envision this?
ReplyDeleteYou should check Translating mathematics into code: Examples in Java, Python, Haskell and Racket, http://matt.might.net/articles/discrete-math-and-code/
ReplyDeleteIt tries to solve the exact problem with type you described here.
I agree, John. I guess that the best single data type to start is just s-expression. Perhaps even implementation of integers, as type different from symbols, was step back to Lisp. But symbols should be somehow reduced to s-expressions; dichotomy between processing of symbols (like make-symbol, gensym) and processing of s-expressions isn't elegant.
ReplyDeleteIt seems to me that `math` is inherently lightweight compared to programming. Many concerns of programming do not seem to readily apply in mathematics:
ReplyDelete* Modularity - information hiding, independent development, encapsulation
* Extension - pluggability, open systems, communication
* Performance - efficiency, compilation
* Resource Control - linearity, garbage collection, ownership, quotas, job control
* Security and Authority - capabilities, sealing
* Failure Modes and Recovery
* Licensing & Intellectual Property
Any of these concerns can influence `types` in various ways that are outside mathematical idealism.
I agree that a lot of type systems seem like `red tape`.
Some of those concerns are indeed surely more cs than math — performance and licensing seem especially so. But with others, it may be our approach that is un-math-like, and that may be only because we haven't yet risen to the challenge of properly reapplying the mathematical principles to cs. In that regard, modularity especially catches my eye. I won't frame this well, perhaps, as I don't have a pre-practiced smooth articulation of the thought, but — If you're working with complex numbers as primitives, with real numbers defined as a subset of them, and somebody does something based on complex numbers being defined as pairs of real numbers, you can say, "but I'm not defining complex numbers that way". That seems to be information hiding... but it's lightweight.
ReplyDeleteA thought that's crossed my mind since I wrote this is that some part of the heaviness of types is because we don't mean types as much for us to reason about programs as for computers to reason about programs — and, as I noted on LtU, we think and are clumsy at computing, while computers compute and are clumsy at thinking. So types by catering to computers become more difficult for us.
This comment has been removed by the author.
ReplyDeleteThis is an interesting discussion.
ReplyDeleteThe world I write software in is rarely neat and tidy. There are not nice clean sets with unambiguous rule sets. Instead we have semantics that evolve and morph as the understanding of the problem space and the solutions evolve. We have relationships between those semantics that are far from hard and fast. Its not possible to build an unambiguous grammar because the problem exists in a space full of ambiguity and edge cases.
The point is that a programming language suitable for describing clean abstract models will inevitably be more math-like than a language that has evolved to describe messy people problem spaces.
Types and the structure around them are a very useful construct that form a functional bridges between messy semantics and clean data.
The difference between an int32 and an int64 is not visible in 2=1+1 but elsewhere.
ReplyDeleteI think that this observation has something to do with the concept of "light weight sets" -- if we want them to be light weight, I think we are sometimes going to be playing fast and loose with the boundary cases.
Part of the heaviness here is the that that you are working with int32 and int64. It's the existence of the boundary cases that drag them down. In contrast, bignum has no boundary cases (short of things like the amount of virtual memory available).
ReplyDelete> Indeed, contrasting types and sets, a major difference between them is that types have object identity
ReplyDeleteI'm struggling to understand what you mean by that. How does the Haskell type "Monad m => m a -> (a -> m b) -> m b" or the Java type "List<? extends Map<String, ? extends Number>>" have an object identity?
I'll try to clarify, and then carry the thought one step further than I had above.
ReplyDeleteI did note that not all types have this. What I meant by the term object identity here —a term whose habit I believe I picked up years ago from the Brown University database folks— is types with separate identical definitions, which are distinct types because the definitions are separate, rather than the same types because the definitions are identical. So that when you say "this is of type X" and "that is of type Y", though X and Y are identically defined, you aren't saying the same things about this and that.
It does occur to me, though, that some such identity is probably necessary for information hiding, which math reasoning does better than we do. Yet I still think it's weighing down our types. Looking to reconcile those two thoughts, perhaps in lightweight reasoning the identity device should be orthogonal to sets.
I got ya, thanks for clarifying. Nominative vs structural types.
ReplyDeleteA different way of viewing Types is that they are a mechanism for error control, and as such their use better matches the human psychology issues.
ReplyDeleteThe same comparisons can be made with engineering and scientific measurements with their need for well understood units and dimensions (Mass, Length, Time :: metres, kilogram, second; etc.). They also help detect mistakes when used - though it is common for software to repeat such unit and dimension errors, unless embedded (e.g. MathCAD).
Russel and Whitehead's types _are_ connected to types in programming languages! Their type theory was simplified into "simple type theory", which was then formalized by Alonzo Church. If you omit the logical constants from Church's theory, you get the simply-typed lambda-calculus, the prototypical typed programming language.
ReplyDeleteThis also suggests that types have some motivation independent of binary data formats, since it happened before there were electronic computers. :)
I did choose my wording rather carefully there (I'm not perfect, but I do try to word things cautiously); with emphasis added: R&W's types "don't seem to me to have anything really to do with types as experienced in programming." My point is that the actual experience in programming is something quite qualitatively different.
ReplyDeleteYou are probably already familiar with this, but I was very happy when I finally understood that a type represents what we have already proved about a value.
ReplyDeleteI don't know if this is how types are used but I am very excited by what they could be.
Yes! It's exciting, I agree. Yet I suspect that somehow, the way we've been doing types may not offer any path of incremental migration to that exciting possibility ("you can't get there from here"). Hence my desire to understand better where our approach to types has gone wrong.
DeleteI don't know how we've been doing types so I'll agree with you. Treating a class (in object oriented programming) like a type is wrong because you could want a type that only includes balanced trees and you could have a class that represents trees kitting patterns. When you add a pattern for a sweater to the tree, its class doesn't change but some of its types might.
DeleteI like that you allow polymorphic functions to dispatch on arbitrary predicates. That covers most things that I would want.