Rendered at 23:21:39 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
pjmlp 16 hours ago [-]
The natural evolution of compiler toolchains that live long enough on top of LLVM, eventually every one matures into having their own IR.
Even clang is now in the process of doing the same.
> We're going to use Clojure JVM to get our baseline benchmark numbers and then we'll aim to beat those numbers with jank.
> Note that all numbers in this post are measured on my five year old x86_64 desktop with an AMD Ryzen Threadripper 2950X on NixOS with OpenJDK 21. When I say "JVM" in this post, I mean OpenJDK 21.
In 2026, a better baseline would be the Java 26 implementations of OpenJDK, OpenJ9, and GraalVM, with JIT cache across several execution runs.
> In the native world, we don't currently have JIT optimization. It could exist, but LLVM doesn't have any implementation for it and neither does any major C or C++ compiler
Yes they kind of have, that is partially what PGO is used for, to get the program behaviour during training runs, and feed it back into the compilation toolchain.
Also while it isn't native code per se, when targeting bytecode environments like IBM i, WebAssembly, CLR, among others, with C or C++, there is certainly the possibility of having a JIT in the picture.
> Finally, just because jank is written in C++ doesn't mean that we can escape Clojure's semantics. Clojure is dynamically typed, garbage collected, and polymorphic as all get out.
Which is why, benchmarks should also take into account compilers for Common Lisp and Scheme compilers.
Anyway, great piece of work, and it was a very interesting post to read, best wishes to the author finding some support.
stavros 14 hours ago [-]
Isn't the main benefit of LLVM that you get tons of backends for free? What does having your own IR give you that's worth this tradeoff?
eigenspace 13 hours ago [-]
These compilers aren't replacing LLVM, they are adding a compilation step with its own IR where they do certain optimizations and translations *before* handing things off to LLVM.
Basically, the idea is to do as much 'high level' optimization and transformation stuff as you can in your own IR, and then let LLVM handle the low-level stuff and the targeting of specific hardware vendors.
stavros 13 hours ago [-]
That makes sense, thanks. Is this IR at a level where the optimisations can't just be added to LLVM then?
eigenspace 13 hours ago [-]
I don't know much about Jank's implementation, but I can speak to how it's done in Julia (dynamic, high performance language with lispy semantics but matlaby syntax, JIT compiled to LLVM).
I think the big thing is just that LLVM can't really be made to closely model everyone's different weird langauge semantics. In practice, the less C-like your language is, the more hoops you will likely need to jump through in order to prepare your code to be handed off to LLVM if you want to get a good result out of it, otherwise it just wont understand your code well enough to make good optimizations, or may not have the proper optimizations implemented.
Trying to modify LLVM to fit your purposes is a bit of an uphill battle too. You either have to try and convince all the stakeholders that each one of your proposed modifications are worth it (when they're typically just not needed by C-like languages), or you need to maintain a fork which is a nightmare.
Like, just to take one example, Julia has a world-age system I describe here: https://news.ycombinator.com/item?id=48151251#48177215 which most other LLVM users would have no use for, and would just add complexity and overhead for them so I don't think any julia people ever even thought about trying to upstream that.
Julia is a somewhat extreme example. It actually has like 2.5 different IRs internally because it just does a lot of compiler transforms before handing things off to LLVM. We've generally just been on a trajectory of moving more and more stuff over to the Julia side because it gives us maximal control.
mswphd 6 hours ago [-]
Another example: For years rust was limited on performance optimizations in LLVM. Specifically, it was difficult to get LLVM to properly optimize for Rust's generated code, namely where one can make strong aliasing (and non-aliasing) statements using `noalias`. This is a (pre-existing iirc) LLVM attribute.
Despite being a pre-existing feature motivated by C-like languages, typical C/C++ code does not leverage this attribute that much. So there were a surprising number of bugs in the handling of the attribute, and it took a number of years (I didn't follow things closely, but >= 3 for sure, maybe as much as 6?) before they got ironed out enough where it could be enabled.
Jeaye 6 hours ago [-]
As another said, jank is not replacing LLVM or LLVM IR. We still use LLVM IR! There is a diagram here which shows the pipeline: https://book.jank-lang.org/dev/ir.html
The main thing is that we just use our own IR first, to perform optimizations with contextual data which is gone by the time we get to LLVM IR. That's also why these optimizations are not practical to write in LLVM, since by the time we get to LLVM IR, we're too far separated from jank's AST with the high level semantics of Clojure.
So we just add an intermediate step. Once we have jank's AST, turn it into our own IR, do some optimizations on it for things that LLVM won't be able to see, and then hand it off to LLVM to do the rest.
stavros 6 hours ago [-]
Ahh OK, makes perfect sense, and interesting that that IR compiles to C++. Thanks for the info!
ramses0 11 hours ago [-]
In my uninformed opinion it's like the SIMD discussion from yesterday. Without their fancy SIMD library, the optimization [`sqrt(x) * sqrt(x)` === x] gets lost in a sea of C++ template incantations when using that SIMD library.
Similarly, perhaps, if there's some fancy observation of an invariant that can be made about `*.map(...)` that gets "lost in the sauce" once it's been lowered to the typical push/pop/loop mechanisms, then those higher level optimizations are better done in a language specific IR, not the "default" IR.
It's actually IR's all the way down if you think about it...
adgjlsfhk1 12 hours ago [-]
one example of this is type inference. llvm is a statically typed ir, so if you're compiling to it from a language with an expressive type system (dynamically typed or statically typed with generics), you need to do your type inference pre llvm.
netdur 13 hours ago [-]
[dead]
amelius 13 hours ago [-]
I suppose you can always translate your own IR to the one supported by LLVM. That would be the first backend I'd write, if I was making my own IR.
christophilus 23 hours ago [-]
> we're using it to optimize jank to compete with the JVM
The JVM gets a lot of hate, but that is a very high bar. The JVM is a serious piece of kit. I hope Jank succeeds. I'd love to use it in real projects.
drob518 11 hours ago [-]
Indeed. Most of the hate is due to slow start-up time, but once it gets warmed up, the modern JVM has state of the art dynamic compilation and GC. Thousands of man years have been spent getting it to that point.
arikrahman 7 hours ago [-]
With Project Leyden being implemented and JEP 516 coming very soon, those worries will be a thing of the past. Now you can get incredible AOT performance without having to depend on Babashka or GraalVM workarounds.
drob518 7 hours ago [-]
Yep, and I’m totally looking forward to it.
pjmlp 16 hours ago [-]
Additionally, there are many JVMs to chose from, many always make the mistake to equate JVM with OpenJDK, which is like talking about C and only considering GCC or something.
Other JVMs have plenty of goodies, some of them have AOT for about 20 years now, others real time GC, other ones JIT caches before Project Leyden was even an idea, others actual value types as experiment (ObjectLayout on Azul), pauseless GC, cloud based JIT compilers, bare metal deployments, ART also has its goodies somehow despite everything, there is a whole world that is lost when people focus too much on JVM == OpenJDK.
let_rec 15 hours ago [-]
On the other hand, the JVM spec may prohibit some optimizations you are after. It's very dynamic after all!
pjmlp 14 hours ago [-]
Not really, that is the usual argument why CPython is slow.
If anything runtimes like the various JVM implementations, alongside the CLR and JS engines as well, are the bleeding edge of dynamic compiler optimizations with dynamic runtimes.
That is something that gets lost when talking about Java, yes the programming language looks like C++, however the JVM itself is heavily inspired by Smalltalk and Objective-C dynamic semantics.
Coming back to the spec, you will notice that it doesn't mention how threads are implemented, what kind of AOT/JIT are available, or what GC algorithms to implement, leaving enough room space for implementations.
One area where you are actually right, that I just remembered while typing this, are the way reflection or unsafe code hinders some optimizations, hence the ongoing steps that enabling JNI or FFM has to be explicit at startup, dynamic agents also have to be expliclity enabled, and the upcoming final means final (no more changing final fields via reflection).
truth_seeker 14 hours ago [-]
what really matters is :
how far can i get in X programming language by writing just idiomatic code?
how much of SDK and community libs, frameworks help me run my program at bare metal speed ?
What sort of change i have to do exisitng libs, frameworks and my legacy code for CPU, IO and memory efficiency as a migrate to new version ?
pjmlp 14 hours ago [-]
That is only part of the picture, the other part that seems quite forgotten nowadays is:
- how much people actually care about algorithms and data structures
- do they actually know what options their tools have available
- have they ever spend at least an hour reading the man pages, info page or HTML documentations
- have they ever used a profiler, a graphical debugger, an advanced IDE
lemming 23 hours ago [-]
Great article, as always.
There is one thing that I think is important to bear in mind when discussing inlining, especially in the context of Clojure. This is that once a function has been inlined, you can no longer update the definition of that function in the REPL and have that update the behaviour of functions which use it, unless you recompile those as well. This is not a criticism of course, it’s just part of the natural tension between dynamism and performance.
eigenspace 14 hours ago [-]
Julia actually has some really cool machinery for handling this that I would encourage other JIT languages to copy.
Whenever you call a function, that function and any calls in that call stack occur in a 'fixed world age'. Within a given world-age, method tables and global constants are all fixed, and the langauge can be analyzed like it's statically typed (there are escape hatches like `invoke_in_world`, and `invokelatest`)
Between world-ages, things are allowed to change. When a function calls another function, we add a 'backedge' from the caller to the callee.
So if I have `f(x) = g(h(x))`, and I redefine `h`, we then say it's no longer valid, and then we look at the backedge that leads from `h` to `g` and say the old definition of `g` is also no longer valid, and then we go from `g` to `f` and also invalidate the old definition of `f`.
This means that once `f` is called in a new world age (the world-age gets incremented every time a new method is (re)defined, or if a global const is changed / defined), the compiler knows that it has to recompile `f`, `g`, and `h`. What's especially cool is that this system works regardless of inlining, and it allows us to safely do all sorts of interproceedural optimizations, but in a JIT compiled language.
lemming 12 hours ago [-]
That is very cool indeed. Are there limitations that this imposes? Is Julia a whole world compiler or does it support partial compilation?
eigenspace 12 hours ago [-]
There's two main limitations:
1. If you try and re-define a global constant or add new methods inside a running program using `eval` or whatever, then your running program won't see those changes until it advances the world-age (i.e. either by using `invokelatest`, or by returning to the top-level scope). Note though that things like closures and defining functions within functions is fine, you just can't do an arbitrary `eval` to define something completely dynamially
2. Method invalidations can cause a lot of compilation latency. If you load a package that invalidates a bunch of already compiled methods, then those methods will later need to be recompiled, which means you hit some more compiler latency than expected. These invalidations can have false postives too, so sometimes more methods get invalidated than you'd want
__________________________
> Is Julia a whole world compiler or does it support partial compilation?
On the LLVM side, we only do partial compilation. Every function method specialization in each different world (modulo inlining) is its own LLVM module that gets compiled in parallel by LLVM. Non-inlined function calls then involve linking these modules.
On the julia-side with our own custom internal IRs though, that's where we perform whole-world style interproceedural optimizations and inlining before handing the individual compilation units to LLVM. At least if I'm using "whole world" right here. What I mean is essentially everything statically known to be reachable from a compilation unit's entry-point given its signature. If by "whole world" you mean compiling every possible method signature, that's not possible in julia at all, because the space of possible method specializations is infinite due to parametric types.
We generally get the best of both worlds with these two approaches (at the cost of just using a lot of space to store all the different possible specializations and all of our differnt IRs and different pieces of machinery).
Jeaye 6 hours ago [-]
Hey lemming! You're right, which is why it should be used sparingly. Since clojure.core is compiled (on the JVM) with direct linking, reacting to var changes isn't an intended concern, since they're not going to work properly throughout any clojure.core code using that var. This makes it a good candidate ns for inlining things. But users shouldn't just be doing this for their normal application vars without giving it due consideration.
thfuran 22 hours ago [-]
Does that not happen automatically? I know there are contexts in which jvm will deoptimize inlining and recompile, like in response to class loading that causes a call site that was previously provably monomorphic to no longer be.
lemming 20 hours ago [-]
No, it doesn't. In JVM Clojure's case, the vars are usually compiled to the moral equivalent of a global variable holding a pointer to a function. This allows you to update the function if the developer redefines it in the REPL, but it comes at a performance cost (the JVM can't inline it or otherwise optimise it). Clojure also allows you to compile with "direct linking", e.g. for production deployments, where you know you're unlikely to be wanting to dynamically update the code. In those cases defns are compiled down to static methods which call each other - much faster since the JVM can perform its magic with them, but you can't update them at the REPL.
I'm unsure exactly how jank works WRT this tradeoff, but the article makes it sound like it's closer to the direct linking version, but with the inlining etc being done by jank rather than the JVM. I don't know if this is only for AOT or also in JIT cases.
arvyy 13 hours ago [-]
> the vars are usually compiled to the moral equivalent of a global variable holding a pointer to a function. This allows you to update the function if the developer redefines it in the REPL, but it comes at a performance cost (the JVM can't inline it or otherwise optimise it)
might be out of my depth but I find it surprising; I thought compilation through invokedynamic should be able to handle redefinition while still allowing inlining and other jit optimizations
lemming 12 hours ago [-]
Clojure (AFAIK) does not use invokedynamic, except perhaps in the latest version for some of the new interop stuff. It still officially supports JVM 1.8 bytecode. It’s a language which greatly values stability and backwards compatibility, so it’s been very slow to adopt newer JVM features.
kbolino 10 hours ago [-]
Though there may be other reasons not to use it, invokedynamic is not a new feature of the JVM. If they're targeting 1.8 binary compatibility, they certainly have it at their disposal, since it landed in 1.7.
mccoyb 6 hours ago [-]
Is that really true? Can't you track invalidations via a dependency graph?
Right, as you said, you'd have to recompile dependents.
adgjlsfhk1 30 minutes ago [-]
That's what Julia does. It works pretty well.
sieabahlpark 22 hours ago [-]
[dead]
iLemming 4 hours ago [-]
Very cool. My immediate thought was - could this open possibilities to compete with Rust on wasm targets? Upon reviewing it more I realized - probably not. Jank (or any other Clojure dialect) wont' really be in position to beat Rust there, and reasons are structural, not just engineering effort - we need GC and, well, that's the biggest elephant.
But to be completely honest, the question: "do you need wasm at all...?", should be always followed by "why?". For like 95% of cases, Clojurescript saves you weeks/months of work. Easier to build, easier to maintain. That's subjective, of course. Most Rustaceans don't even want to try Clojure. Most Clojurists find Rust to be needlessly complex.
CalChris 21 hours ago [-]
The natural question is why doesn't Jank use MLIR?
Jeaye 6 hours ago [-]
I spoke with a couple Clang and LLVM devs about MLIR when I was doing the original design for jank's IR. The general consensus was that MLIR added a great deal of complexity on top of designing/implementing an IR and nobody was confident it was actually worth the effort. Since I knew exactly what I wanted, I just built that.
CalChris 58 minutes ago [-]
Your custom IR is above LLVM’s IR, correct? Is it like SwiftIR then? Maybe you could add a paragraph or two going through that design decision.
pjmlp 16 hours ago [-]
No language using MLIR uses it directly out of the box, in that sense the right question is why did Jank not create their MLIR dialect.
debugnik 13 hours ago [-]
MLIR dialects have to be lowered into the basic LLVM one eventually, don't they? Does MLIR add anything over a custom IR for host languages that aren't deficient at manipulating data structures?
eigenspace 13 hours ago [-]
'MLIR dialects' is just a term for teaching MLIR how to manipulate and understand your own custom IR.
MLIR is just very good at producing good vectorized code in the presence of stuff like nested loops compared to LLVM or even some of the most carefully crafted custom compilers. It's not about whether your custom compiler is 'deficient' at handling data structures, MLIR is just genuinely very good at some of this stuff compared to basically anyone else.
For most projects it's just more trouble than it's worth though, because maintaining and using an MLIR dialect definition is hard.
debugnik 12 hours ago [-]
But AFAIK those aren't features of MLIR, but of lowering to existing MLIR dialects and running their passes. My genuine question is whether these passes provide any benefit before lowering, because otherwise a custom dialect doesn't add anything over lowering from a custom IR for anyone not using C++; and the only example I've seen is forced inlining.
erichocean 12 hours ago [-]
I suspect the author created his own IR after being offered that suggestion earlier, he's definitely aware it exists.
> Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.
In my mind, what is happening here is you lower Clojure code into LLVM, with a bunch of runtime calls (e.g. your `jank::runtime::dynamic_call`) (e.g. LLVM invoking the runtime over a C ABI).
If that's true, are there any optimizations that LLVM helps out with? Perhaps like DCE? I can't tell immediately, curious about the answer
(question is obviously about the pre-IR state of things)
codebje 21 hours ago [-]
The article talks about inlining a two-arity call to clojure.core/max to instead be an explicit call to cpp/jank.runtime.max, eliminating the unnecessary argument count matching and recursion portions of the Clojure function.
It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.
The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.
A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).
Jeaye 6 hours ago [-]
The first three paragraphs here are on point! jank's IR passes will not worry much about things like load/store optimization, register allocation, inlining C++ functions, etc. These are in LLVM's domain. We just worry about the Clojure side of things. Polymorphic math is intense, but we do our best to avoid the extra work by unboxing whenever possible.
> A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR
All of these math functions are templates with four specific categories:
1. Object and object
2. Primitive and primitive
3. Primitive and object
4. Object and primitive
We handle the difference between typed objects (like integer_ref) and type-erased objects (object_ref) as well. This template then gets inlined, which is exactly what the last step of the benchmark optimizations (adding annotations) ensured. The return type of these functions will prefer primitive types, rather than automatically boxing. jank's analyzer tracks all types used, at compile-time, and supports automatic boxing. This means that we're already using the most optimal primitive math whenever we can and that it will indeed inline to just an operator call when working on two primitives, or two typed objects, or a combination thereof.
Probably a stupid question, but is LLVM better at optimising its IR than C compilers are at optimising C? Asked another way, why not use C as an IR, if it's compatible with your language semantics?
eigenspace 12 hours ago [-]
LLVM is essentially what you get when you say "I want to use C as an IR", and then try and do it for a bit and say "hmm, okay I'd like to put some restrictions on this IR... and maybe some customization hooks... and maybe this feature..."
adgjlsfhk1 12 hours ago [-]
C is a really bad IR for a lot of reasons. it has incrediby opinionated semantics (e.g. the huge amount of UB and platform specific behavior). LLVM is a lot more verbose, but allows you to actually pick the semantics you want.
Even clang is now in the process of doing the same.
> We're going to use Clojure JVM to get our baseline benchmark numbers and then we'll aim to beat those numbers with jank.
> Note that all numbers in this post are measured on my five year old x86_64 desktop with an AMD Ryzen Threadripper 2950X on NixOS with OpenJDK 21. When I say "JVM" in this post, I mean OpenJDK 21.
In 2026, a better baseline would be the Java 26 implementations of OpenJDK, OpenJ9, and GraalVM, with JIT cache across several execution runs.
> In the native world, we don't currently have JIT optimization. It could exist, but LLVM doesn't have any implementation for it and neither does any major C or C++ compiler
Yes they kind of have, that is partially what PGO is used for, to get the program behaviour during training runs, and feed it back into the compilation toolchain.
Also while it isn't native code per se, when targeting bytecode environments like IBM i, WebAssembly, CLR, among others, with C or C++, there is certainly the possibility of having a JIT in the picture.
> Finally, just because jank is written in C++ doesn't mean that we can escape Clojure's semantics. Clojure is dynamically typed, garbage collected, and polymorphic as all get out.
Which is why, benchmarks should also take into account compilers for Common Lisp and Scheme compilers.
Anyway, great piece of work, and it was a very interesting post to read, best wishes to the author finding some support.
Basically, the idea is to do as much 'high level' optimization and transformation stuff as you can in your own IR, and then let LLVM handle the low-level stuff and the targeting of specific hardware vendors.
I think the big thing is just that LLVM can't really be made to closely model everyone's different weird langauge semantics. In practice, the less C-like your language is, the more hoops you will likely need to jump through in order to prepare your code to be handed off to LLVM if you want to get a good result out of it, otherwise it just wont understand your code well enough to make good optimizations, or may not have the proper optimizations implemented.
Trying to modify LLVM to fit your purposes is a bit of an uphill battle too. You either have to try and convince all the stakeholders that each one of your proposed modifications are worth it (when they're typically just not needed by C-like languages), or you need to maintain a fork which is a nightmare.
Like, just to take one example, Julia has a world-age system I describe here: https://news.ycombinator.com/item?id=48151251#48177215 which most other LLVM users would have no use for, and would just add complexity and overhead for them so I don't think any julia people ever even thought about trying to upstream that.
Julia is a somewhat extreme example. It actually has like 2.5 different IRs internally because it just does a lot of compiler transforms before handing things off to LLVM. We've generally just been on a trajectory of moving more and more stuff over to the Julia side because it gives us maximal control.
Despite being a pre-existing feature motivated by C-like languages, typical C/C++ code does not leverage this attribute that much. So there were a surprising number of bugs in the handling of the attribute, and it took a number of years (I didn't follow things closely, but >= 3 for sure, maybe as much as 6?) before they got ironed out enough where it could be enabled.
The main thing is that we just use our own IR first, to perform optimizations with contextual data which is gone by the time we get to LLVM IR. That's also why these optimizations are not practical to write in LLVM, since by the time we get to LLVM IR, we're too far separated from jank's AST with the high level semantics of Clojure.
So we just add an intermediate step. Once we have jank's AST, turn it into our own IR, do some optimizations on it for things that LLVM won't be able to see, and then hand it off to LLVM to do the rest.
Similarly, perhaps, if there's some fancy observation of an invariant that can be made about `*.map(...)` that gets "lost in the sauce" once it's been lowered to the typical push/pop/loop mechanisms, then those higher level optimizations are better done in a language specific IR, not the "default" IR.
It's actually IR's all the way down if you think about it...
The JVM gets a lot of hate, but that is a very high bar. The JVM is a serious piece of kit. I hope Jank succeeds. I'd love to use it in real projects.
Other JVMs have plenty of goodies, some of them have AOT for about 20 years now, others real time GC, other ones JIT caches before Project Leyden was even an idea, others actual value types as experiment (ObjectLayout on Azul), pauseless GC, cloud based JIT compilers, bare metal deployments, ART also has its goodies somehow despite everything, there is a whole world that is lost when people focus too much on JVM == OpenJDK.
If anything runtimes like the various JVM implementations, alongside the CLR and JS engines as well, are the bleeding edge of dynamic compiler optimizations with dynamic runtimes.
That is something that gets lost when talking about Java, yes the programming language looks like C++, however the JVM itself is heavily inspired by Smalltalk and Objective-C dynamic semantics.
Coming back to the spec, you will notice that it doesn't mention how threads are implemented, what kind of AOT/JIT are available, or what GC algorithms to implement, leaving enough room space for implementations.
One area where you are actually right, that I just remembered while typing this, are the way reflection or unsafe code hinders some optimizations, hence the ongoing steps that enabling JNI or FFM has to be explicit at startup, dynamic agents also have to be expliclity enabled, and the upcoming final means final (no more changing final fields via reflection).
how far can i get in X programming language by writing just idiomatic code?
how much of SDK and community libs, frameworks help me run my program at bare metal speed ?
What sort of change i have to do exisitng libs, frameworks and my legacy code for CPU, IO and memory efficiency as a migrate to new version ?
- how much people actually care about algorithms and data structures
- do they actually know what options their tools have available
- have they ever spend at least an hour reading the man pages, info page or HTML documentations
- have they ever used a profiler, a graphical debugger, an advanced IDE
There is one thing that I think is important to bear in mind when discussing inlining, especially in the context of Clojure. This is that once a function has been inlined, you can no longer update the definition of that function in the REPL and have that update the behaviour of functions which use it, unless you recompile those as well. This is not a criticism of course, it’s just part of the natural tension between dynamism and performance.
Whenever you call a function, that function and any calls in that call stack occur in a 'fixed world age'. Within a given world-age, method tables and global constants are all fixed, and the langauge can be analyzed like it's statically typed (there are escape hatches like `invoke_in_world`, and `invokelatest`)
Between world-ages, things are allowed to change. When a function calls another function, we add a 'backedge' from the caller to the callee.
So if I have `f(x) = g(h(x))`, and I redefine `h`, we then say it's no longer valid, and then we look at the backedge that leads from `h` to `g` and say the old definition of `g` is also no longer valid, and then we go from `g` to `f` and also invalidate the old definition of `f`.
This means that once `f` is called in a new world age (the world-age gets incremented every time a new method is (re)defined, or if a global const is changed / defined), the compiler knows that it has to recompile `f`, `g`, and `h`. What's especially cool is that this system works regardless of inlining, and it allows us to safely do all sorts of interproceedural optimizations, but in a JIT compiled language.
1. If you try and re-define a global constant or add new methods inside a running program using `eval` or whatever, then your running program won't see those changes until it advances the world-age (i.e. either by using `invokelatest`, or by returning to the top-level scope). Note though that things like closures and defining functions within functions is fine, you just can't do an arbitrary `eval` to define something completely dynamially
2. Method invalidations can cause a lot of compilation latency. If you load a package that invalidates a bunch of already compiled methods, then those methods will later need to be recompiled, which means you hit some more compiler latency than expected. These invalidations can have false postives too, so sometimes more methods get invalidated than you'd want
__________________________
> Is Julia a whole world compiler or does it support partial compilation?
On the LLVM side, we only do partial compilation. Every function method specialization in each different world (modulo inlining) is its own LLVM module that gets compiled in parallel by LLVM. Non-inlined function calls then involve linking these modules.
On the julia-side with our own custom internal IRs though, that's where we perform whole-world style interproceedural optimizations and inlining before handing the individual compilation units to LLVM. At least if I'm using "whole world" right here. What I mean is essentially everything statically known to be reachable from a compilation unit's entry-point given its signature. If by "whole world" you mean compiling every possible method signature, that's not possible in julia at all, because the space of possible method specializations is infinite due to parametric types.
We generally get the best of both worlds with these two approaches (at the cost of just using a lot of space to store all the different possible specializations and all of our differnt IRs and different pieces of machinery).
I'm unsure exactly how jank works WRT this tradeoff, but the article makes it sound like it's closer to the direct linking version, but with the inlining etc being done by jank rather than the JVM. I don't know if this is only for AOT or also in JIT cases.
might be out of my depth but I find it surprising; I thought compilation through invokedynamic should be able to handle redefinition while still allowing inlining and other jit optimizations
Right, as you said, you'd have to recompile dependents.
But to be completely honest, the question: "do you need wasm at all...?", should be always followed by "why?". For like 95% of cases, Clojurescript saves you weeks/months of work. Easier to build, easier to maintain. That's subjective, of course. Most Rustaceans don't even want to try Clojure. Most Clojurists find Rust to be needlessly complex.
MLIR is just very good at producing good vectorized code in the presence of stuff like nested loops compared to LLVM or even some of the most carefully crafted custom compilers. It's not about whether your custom compiler is 'deficient' at handling data structures, MLIR is just genuinely very good at some of this stuff compared to basically anyone else.
For most projects it's just more trouble than it's worth though, because maintaining and using an MLIR dialect definition is hard.
https://carp-lang.github.io/carp-docs/LanguageGuide.html
Has anyone been playing with it on HN ?
> Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.
In my mind, what is happening here is you lower Clojure code into LLVM, with a bunch of runtime calls (e.g. your `jank::runtime::dynamic_call`) (e.g. LLVM invoking the runtime over a C ABI).
If that's true, are there any optimizations that LLVM helps out with? Perhaps like DCE? I can't tell immediately, curious about the answer
(question is obviously about the pre-IR state of things)
It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.
The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.
A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).
> A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR
All of these math functions are templates with four specific categories:
1. Object and object
2. Primitive and primitive
3. Primitive and object
4. Object and primitive
We handle the difference between typed objects (like integer_ref) and type-erased objects (object_ref) as well. This template then gets inlined, which is exactly what the last step of the benchmark optimizations (adding annotations) ensured. The return type of these functions will prefer primitive types, rather than automatically boxing. jank's analyzer tracks all types used, at compile-time, and supports automatic boxing. This means that we're already using the most optimal primitive math whenever we can and that it will indeed inline to just an operator call when working on two primitives, or two typed objects, or a combination thereof.
You can see the code for this here: https://github.com/jank-lang/jank/blob/29c2adb344526d26c8e82...