Read This First
This page is a design scratchpad. Please see: http://clojure.org/datatype for the current documentation.
Clojure issue tracking now lives at http://dev.clojure.org/jira, and the wiki is at http://dev.clojure.org. These Assembla pages are kept online for historical interest only.
reify, deftype
This page describes work in progress in the 'new' branch of Clojure. All features are subject to change - feedback is welcome.
Motivation
Important code-generation capabilities are locked in fn.
Several parts of Clojure are written in Java for the ultimate performance - we'd like to be able to write them in Clojure with equal performance.
Clojure is defined in terms of a set of abstractions, currently written in Java. We'd like to be able to define and implement those abstractions in Clojure. The datatype features, when coupled with protocols, will enable that.
Implementation
The basic idea is to harness the code generation of fn, which compiles Clojure code to bytecode and supports lexical closure, and make it available in a form that could create classes of types other than anonymous derivees of IFn. This support comes in the form of 2 new special forms/macros:
1) reify (previously brainstormed as newnew) is the most dynamic. Like proxy, it creates an instance of an anonymous class that implements one or more protocols or interfaces. The method bodies of reify are lexical closures, and can refer to the surrounding local scope. reify differs from proxy in that:
- Only protocols or interfaces are supported, no concrete superclass.
- The method bodies are true methods of the resulting class, not external fns.
- Invocation of methods on the instance is direct, not using map lookup.
- No support for dynamic swapping of methods in the method map.
The result is better performance than proxy, both in construction (proxy creates the instance and a fn instance for each method), and invocation. reify is preferable to proxy in all cases where its limitations are not prohibitive.
2) deftype dynamically generates compiled bytecode for an anonymous class with a set of given fields, and, optionally, methods for one or more protocols and/or interfaces. Instances of the resulting type will have a supplied type tag. deftype is suitable for dynamic and interactive development, it need not be AOT compiled, and can be re-evaluated in the course of a single session. deftype is similar to defstruct in generating data structures with named fields, but differs from defstruct in that:
- deftype generates a unique class, with fields corresponding to the given names.
- keyword field access provides better performance than even the generated accessors of defstruct
- the resulting class has a proper type slot, unlike conventions for encoding type for structs in metadata
- fields can have type hints, and can be primitive
- a deftype can implement one or more protocols and/or interfaces
Like defstruct, deftype can generate classes with (optional) support for the IPersistentMap interface, allowing them to be used anywhere that maps can. Overall, deftypes will be better than structmaps for all purposes, especially for defining your own data abstractions.
3) In addition, when deftype is AOT compiled:
- generates a named class with the supplied name
- because it generates a named class, it has an accessible constructor
- because it generates a named class, its fields can be accessed from outside the class using (.field an-instance)
AOT-compiled deftype may be suitable for some of the use cases of gen-class, where its limitations are not prohibitive. In those cases it will have better performance than gen-class.
Details
Note - although an attempt is made to keep this up to date, you should always get the definitive documentation on the version you are using via (doc reify), (doc deftype) etc.
reify is a macro with the following structure:
(reify options* specs*)
where options can be:
:as this-name
and specs are:
protocols-or-interface-or-Object
(methodName [args*] body)*
methods should be supplied for all methods of the desired interface(s). You can also define overrides for methods of Object.
- return type can be indicated by a type hint on method name
- arg types can be indicated by a type hint on arg names
- you do not supply a parameter corresponding to the target object ('this' in Java)
- thus methods for protocols will have one fewer arguments than protocol fns
- if you leave out all hints: will try to match on same name/arity method in interface(s)
- this is preferred
- if you supply any hints at all, no inference is done, so all hints (or default of Object) must be correct, for both arguments and return type
- If a method is overloaded in an interface, multiple independent method definitions must be supplied.
- if overloaded with same arity in interface you must specify complete hints to disambiguate
- missing hint implies Object
- recur works to method heads
- The method bodies of reify are lexical closures, and can refer to the surrounding local scope
((str (let [f "foo"]
(reify Object
(toString [] f))))
== "foo"
(seq (let [f "foo"]
(reify clojure.lang.Seqable
(seq [] (seq f)))))
== (\f \o \o)
deftype is a macro with the following structure:
(deftype Name [fields*] options* specs*) ;options and specs same as for reify
that does the following:
- Dynamically generates compiled bytecode for an anonymous class with the given fields, and, optionally, protocols, interfaces and methods.
- The Name will be used to create a dynamic type tag keyword of the form :current.ns/Name. This tag will be returned from (type an-instance).
- A factory function of current.ns/Name will be defined, overloaded on 2 arities, the first taking the designated fields in the same order specified, and the second taking the fields followed by a metadata map (nil for none) and an extension field map (nil for none). The class will have the (immutable) fields named by fields, which can have type hints.
- Protocols, interfaces and methods are optional. The only methods that can be supplied are those declared in the protocols.interfaces. Note that method bodies are not closures, the local environment includes only the named fields, and those fields can be accessed directy.
- As with reify, method definitions take the form: (methodname [args] body)
- The argument and return types can be hinted on the arg and methodname symbols. If not supplied, they will be inferred, so type hints should be reserved for disambiguation.
- The class will have implementations of two (clojure.lang) interfaces generated automatically: IObj (metadata support) and ILookup (get and keyword lookup for fields). If you specify IPersistentMap as an interface, but don't define methods for it, an implementation will be generated automatically.
- In addition, unless you supply a version of hashCode or equals, deftype will define type-and-value-based equality and hashCode.
- In the method bodies, the (unqualified) name can be used to name the class (for calls to new etc)
(deftype Bar [a b c d e])
(def b (Bar 1 2 3 4 5))
b
#:Bar{:a 1, :b 2, :c 3, :d 4, :e 5}
(:c b)
3
(type b)
:user/Bar
(meta (with-meta b {:foo :bar}))
{:foo :bar}
When deftype is AOT compiled:
- When compiling, generates compiled bytecode for a class with the given name, prepends the current ns as the package, and writes the .class file to the *compile-path* directory.
- A public constructor will be defined, taking the designated fields followed by a metadata map (pass nil for none) and an extension field map (pass nil for none).
Prerelease changes under consideration
- Split deftype and defrecord
- defrecord will provide full impl of IPersistentMap et al
- deftype - nothing, not even equality/hash
- Provide stable names, even dynamically
- ditto for definterface
- Means factory fn will have to have another name or will clash if type imported
- auto-import types into same ns?
- but want to discourage (Type. x y) ctor usage?
- what name then for factory?
- Factory fns based on static methods?
- so useful from Java too?
- must provide some way to define
- same static method impl capability can be used for helpers
- Destructuring in methods?
- record equality
- basic map rules, or incorporate type?
- ctor or factory from k v pairs
New ideas/scratchpad
- Explicit ctor sigs?
- Explicit private methods?
- :mixin [(impl-x ...) (impl-y ...) ...]
- where mixins are macro-like form-returning fns
- returned forms (must return sequence of forms) get spliced into body
- can take (unevaluated) args passing field names or anything else they need in order to work
- mutable fields for reify
- where would they go?
- will look different from deftype in any case
- warn on incomplete protocol/interface
- loading accompanying namespace for deftype?
- feature of genclass, but genclass is AOT-only
- people looking to consume class from Java need to init namespace(s)
- implicit this in deftype/reify/extend-type, extend-class?
- fits deftype/reify better than extend-*
- recur behavior is indicator
- reify can't have implicit 'this', due to nesting/macro-wrapping, but could have named this as before
- (reify this-name P ...)
- how to disambiguate this-name from P?
- extend-type style: P1 (foo [] ...) (bar [] ...) P2 (baz [] ...)
- vs [P1 P2] (foo [] ...) (bar [] ...) (baz [] ...)
- multiple arity in defprotocol - drop grouping parens
- (defprotocol P (foo [x] [x y] [x y z] "docs"))
- Fold defclass into deftype, i.e. when compiling deftype Foo get class with same name, Foo, when not, get Foo__nnn. This will reduce to almost nothing the changes required to expose a deftype by name to Java.
-
- name clash between class name and factory fn
- all types (including compiled) will implement IDynamicType?
- use of unqualified name in method bodies of deftype (want to support that anyway)
- if deftype has been re-evaluated
- calls to (new Foo ...) will still use compiled class
- calls to (.field afoo) will work only for Foos created by new
- users always confused and disappointed by lack of dynamism of named things, using 2 different names separates expectations, combining requires more understanding
- Validators for deftypes
Old ideas/scratchpad
This is a scratchpad for ideas relating to a new datatypes feature. This is not a promise of any feature nor attributes of the feature, this is here to allow for feedback and input.
There are several motivations for datatypes:
- Provide highest-performance data field access
- equivalent to Java
- obviate need to go to Java
- Provide proper types where structs do not
- Support primitives where structs do not
- Enforce proper immutability semantics
- Separate methods from data
Basics
A datatype is a simple definition of a class that includes only fields with some hints.
(deftype org.myns.Foo
fred
#^int ethel
#^String lucy)
The above would produce a class named org.myns.Foo. The generated class would have the following:
- A public ctor taking (Object, int, String)
- An ordinary Clojure function (Foo ...) matching/wrapping the ctor
- public final fields fred, ethel, lucy of the corresponding types
- a definition of equals based upon identical type + per field =
- a definition of hashCode based upon combining field hashes, w/nil handling
- Support for metadata
- a public final field (__meta__ ?) for a metadata map
- a public ctor taking metamap followed by fields
- support for IMeta and IObj (meta and withMeta methods)
- Support for ILookup (basic support for Clojure's associative get)
- Lookup based upon keywordified versions of field names (:fred, :ethel, :lucy)
- fast switch-like implementation
- Support for print/read
- #org.myns.Foo{:fred 1 :ethel 42 :lucy "ricky"} ?
- The class would be public and final
- deftypes can be AOT compiled to .class files
- When not precompiled, will be loaded into the root classloader
- Support for implementing host interfaces
- can specify interfaces (only!) with :implements [interfaces...]
- methods can be specified using inline defs:
- (amethod [this args...] ...)
- can also override Object methods (equals, hashCode, toString)
- defining only one of equals/hashCode will throw exception
- this (under any name?) must be explicitly supplied as first arg, but will be auto-hinted to class
- Since deftype yields real, named classes, whose names matter, the fields and interfaces cannot be changed without restarting the JVM
- but changes to method implementations will be dynamically picked up when deftype is re-evaluated (promise this forever?)
Issues
- How much compatibility with maps?
- equality, hash etc
- deftype instanceof j.u.Map?
- What, if any, relationship to ns enclosing deftype?
- implementing interfaces creates an init dependency on the impl code
- i.e. consumers of deftype classes need the deftype body to run in order to install fn bodies for methods
- same :impl-ns/:load-impl-ns stuff as gen-class?
- more likely to be more than one deftype in a single file
- potential method name conflicts between auto-implemented (ILookup, IMeta) interfaces and explicit interfaces (Java)
Other ideas:
Associative/Expando support
- support for assoc k v
- would require __expando__ field of map type
- expandomap arg added to ctor with metamap
- logic to look in expando when not found in fields
- print/read enhancements
- dissoc of field key yields ordinary map no longer of type?
Indexed support
- support for nth
- would allow sequential [fred ethel lucy] destructuring
Universal field identifiers
A problem with classes, IMO, is that each creates its own micro language. Getting methods out (see protocols) is a first step, but field names are still a problem, since each class defines its own scope. The 'name' field of 5 different classes might have the same semantics, or might not. Duck typing on local names is not good enough. Universal field identifiers would allow you to indicate that some field has some more universal agreed upon semantics (e.g. an RDF URI).
- support supplying a URI string and/or keyword in addition to a field name
- #^{:tag String :alias ["http://xmlns.com/foaf/0.1/name" :foaf:name]} name
- lookups via uri or uri-key would return the corresponding (name) field
- TBD strings vs keywords?
- TDB multiple aliases?
- TDB print/read interaction
Constraints
- Support for logical/range/etc constraints on fields or the entire object
- TDB - a single validation fn?
- Could be set at class/static level, so no additional runtime storage
Equality given mutable fields like arrays, and objects with broken equals()
- proper and common use in Clojure impl is to treat array contents as immutable, and equals() as correct, but no guarantee, only identity compare is universally safe
- by default, presumes immutability, but can override with metadata:
- #^{:equals :identity|:ignore} foo
- to use identity, or ignore, respectively, in both equals and hashCode, for field foo
- no - just use = per field and define equals to override
- defining equals (see below) will cause any :equals hints to be ignored (warn?)