Show HN: GlyphLang – An AI-first programming language

While working on a proof of concept project, I kept hitting Claude's token limit 30-60 minutes into their 5-hour sessions. The accumulating context from the codebase was eating through tokens fast. So I built a language designed to be generated by AI rather than written by humans.

GlyphLang

GlyphLang replaces verbose keywords with symbols that tokenize more efficiently:

  # Python
  @app.route('/users/<id>')
  def get_user(id):
      user = db.query("SELECT * FROM users WHERE id = ?", id)
      return jsonify(user)

  # GlyphLang
  @ GET /users/:id {
    $ user = db.query("SELECT * FROM users WHERE id = ?", id)
    > user
  }

  @ = route, $ = variable, > = return. Initial benchmarks show ~45% fewer tokens than Python, ~63% fewer than Java.
In practice, that means more logic fits in context, and sessions stretch longer before hitting limits. The AI maintains a broader view of your codebase throughout.

Before anyone asks: no, this isn't APL with extra steps. APL, Perl, and Forth are symbol-heavy but optimized for mathematical notation, human terseness, or machine efficiency. GlyphLang is specifically optimized for how modern LLMs tokenize. It's designed to be generated by AI and reviewed by humans, not the other way around. That said, it's still readable enough to be written or tweaked if the occasion requires.

It's still a work in progress, but it's a usable language with a bytecode compiler, JIT, LSP, VS Code extension, PostgreSQL, WebSockets, async/await, generics.

Docs: https://glyphlang.dev/docs

GitHub: https://github.com/GlyphLang/GlyphLang

40 points | by goose0004 1 day ago

16 comments

  • noosphr 1 day ago
    I've found that short symbols cause collisions with other tokens in the llms vocabulary. It is generally much better to have long descriptive names for everything in a language than short ones.

    An example that shocked me was using an xml translation of C for better vector search. The lack of curly braces made the model return much more relavent code than using anything else, including enriching the database with ctags.

    >GlyphLang is specifically optimized for how modern LLMs tokenize.

    This is extremely dubious. The vocabulary of tokens isn't conserved inside model families, let alone entirely different types of models. The only thing they are all good at is tokenizing English.

    • goose0004 1 day ago
      The collision point is interesting, but I'd argue context disambiguates. If I'm understanding you correctly, I don't think the models are confused about whether or not it's looking at an email when `@` appears before a route pattern. These symbols are heavily represented in programming contexts (e.g. Python decorators, shell scripts, etc.), so LLMs have seen them plenty of times in code. I'd be interested if you shared your findings though! Definitely an issue I would like to see if I could avoid or at least mitigate somewhat.

      That's an absolutely fair point that vocabularies differ regarding the tokenizer variance, but the symbols GlyphLang uses are ASCII characters that tokenize as single tokens across GPT4, Claude, and Gemini tokenizers. THe optimization isn't model-specific, but rather it's targeting the common case of "ASCII char = 1 token". I could definitely reword my post though - looking at it more closely, it does read more as "fix-all" rather than "fix-most".

      Regardless, I'd genuinely be interested in seeing failure cases. It would be incredibly useful data to see if there are specific patterns where symbol density hurts comprehension.

      • sitkack 19 minutes ago
        If context disambiguates, then you have to use attention which is even more resource intensive.

        You want to be as state free as possible. Your tokenizer should match your vocab and be unambiguous. I think your goal is sound, but golfing for the wrong metric.

      • noosphr 8 hours ago
        Collision is perhaps the wrong word. But llms definitely have trouble disambiguating different symbols of a language that map to similar tokens.

        Way back in the gpt3.5 days I could never get the model to do a parse of even the simplest grammar until I replaced the one letter production rules with one word production rules, e.g. S vs Start. A bit like how they couldn't figure out the number of rs in strawberry.

  • jy14898 1 day ago
    I feel like any deviation from the syntax LLMs are trained on is not productive.

    Sure you can represent the same code in fewer tokens but I doubt it'll get those tokens correct as often.

    • rubyn00bie 1 day ago
      Yeah, big plus one from me. I recently tried to investigate some sort of alternative encoding to/from “the prompt,” and was swiftly told that was both not possible and would work against me. As you pointed out, the LLMs are trained on language and language itself is often not terse. Trying to skirt that will cause the LLM to calculate the vectors poorly because the relation between the input tokens and its training data doesn’t really exist.
  • everlier 1 day ago
    Arguably, math notation and set theory already has everything that we need.

    For example see this prompt describing an app: https://textclip.sh/?ask=chatgpt#c=XZTNbts4EMfvfYqpc0kQWpsEc...

    • goose0004 1 day ago
      That's an awesome tool! I think textclip.sh solves a different problem though (correct me if I'm wrong - this is the first I've been exposed to it). Compression at the URL/transport layer helps with sharing prompts, but the token count still hits you once the text is decompressed and fed into the model. The LLM sees the full uncompressed text.

      The approach with GlyphLang is to make the source code itself token-efficient. When an LLM reads something like `@ GET /users/:id { $ user = query(...) > user }`, that's what gets tokenized (not a decompressed version). The reduced tokenization persists throughout the context window for the entire session.

      That said, I don't think they're mutually exclusive. You could use textclip.sh to share GlyphLang snippets and get both benefits.

      • everlier 1 day ago
        Yes, the tool here is just to share the prompt, sorry the first one I had handy is the one describing the service itself.

        Here's it in plain text to be more visible:

        ``` textclip.sh→URL gen: #t=<txt>→copy page | ?ask=<preset>#t=→svc redirect | ?redirect=<url>#t=→custom(use __TEXT__ placeholder). presets∈{claude,chatgpt,perplexity,gemini,google,bing,kagi,duckduckgo,brave,ecosia,wolfram}. len>500→auto deflate-raw #c= base64url encoded, efficient≤16k tokens. custom redirect→local LLM|any ?param svc. view mode: txt display+copy btn+new clip btn; copy→clipboard API→"Copied!" feedback 2s. create mode: textarea+live counters{chars,~tokens(len/4),url len}; color warn: tokens≥8k→yellow,≥16k→red; url≥7k→yellow,≥10k→red. badge gen: shields.io md [!alt](target_url); ```

        It uses math notation to heavily compress the representation while keeping information content relatively preserved (similarly to GlyphLang. Later, LLM can comfortably use it to describe service in detail and answer user's questions about it. Same is applicable to arbitrary information, including source code/logic.

  • p0w3n3d 1 day ago
    I think the gain is very little. Almost every English word is on token, the same with programming language keywords. So you're just replacing one keyword with another. The only gain in the example given is > instead of jsonify() which would be ~4 tokens.

    Please check your idea agains tiktokenizer

    • p0w3n3d 1 day ago
      I've checked and you get 36->30 tokens decreasal but no human readability. sounds like a poor trade
      • goose0004 5 hours ago
        Looks like my tokenization review method was incorrect - honestly a little embarrassing on my part. I think it would have been a lot longer before I discovered it, so thanks for the comment!

        I did just go through and ran equivalent code samples in the GlyphLang repo (vs the sample code I posted that I'm assuming you ran) through tiktoken and found slightly lower percentages, but still not insignificant: on average 35% fewer than Python and 56% fewer than Java. I've updated the README with the corrected figures and methodology if you want to check: https://github.com/GlyphLang/GlyphLang/blob/main/README.md#a...

  • spankalee 13 hours ago
    Interesting to see the difference in opinion on "AI-first".

    I'm working on what might be called an "AI-first" programming language too, but for syntax I'm focusing on familiarity. Both because I presume LLMs will have an easier time generating familiar code, and because humans will have an easier time reviewing it.

    Syntax is only a small portion of being AI friendly though, IMO. A huge part of my own effort is safety and compile-time feedback: a sound static type system, sandboxed execution, strong immutable patterns, linting, and advanced type system features like ADTs, distinct types, extension types, units of measure, etc.

  • tricorn 1 day ago
    Don't optimize the language to fit the tokens, optimize the tokens to fit the language. Tokenization is just a means to compress the text, use a lot of code in target languages to determine the tokenizing, then do the training using those tokens. More important is to have a language where the model can make valid predictions of what effective code will be. Models are "good" at Python because they see so much of it. To determine what language might be most appropriate for an AI to work with, you'd need to train multiple models, each with a tokenizer optimized for a language and training specifically targeting that language. One language I've had good success with, despite having low levels of training in it, is Tcl/Tk. As the language is essentially a wordy version of Lisp (despite Stallman's disdain for it), it is extremely introspective with the ability to modify the language from within itself. It also has a very robust extensible sandbox, and is reasonably efficient for an interpreted language. I've written a scaffold that uses Tcl as the sole tool-calling mechanism, and despite a lot of training distracting the model towards Python and JSON, it does fairly well. Unfortunately I'm limited in the models I can use because I have an old 8GB GPU, but I was surprised at how well it manages to use the Tcl sandbox with just a relatively small system prompt. Tcl is a very regular language with a very predictive structure and seems ideal for an LLM to use for tool calling, note taking, context trimming, delegation, etc. I haven't been working on this long, but it is close to the point where the model will be able to start extending its own environment (with anything potentially dangerous needing human intervention).
  • jaggederest 1 day ago
    Funny, I've been noodling on something that goes the other direction - avoiding symbols as much as possible and trying to use full english words.

    Very underbaked but https://github.com/jaggederest/locque

    • goose0004 5 hours ago
      This is great! Looks significantly more verbose, though I admit I haven't looked through all of your documentation. I'm very interested in knowing how it's performing!
      • jaggederest 5 hours ago
        Claude is middling, codex is great with it - I think codex has significantly better math reasoning and it's all very mathy compared to e.g. typescript. Everything in the repo is LLM generated at the moment - I expect in the near future to have to start manually doing some things, but I'm not sure where the sticking point will be.

        It's already good enough that I'm thinking self-hosting will be relatively quick, which is a huge deal at least in my opinion. Having proper self-hosting locque-in-locque and tools in locque in the first ~6 months would be superlative.

  • jbritton 1 day ago
    I had a conversation with Claude about what language to work in. It was a web app and it led me to Typescript mainly because of the training data for the model, plus typing and being able to write pure functions. Haskell might have been preferred except for the lower amounts of training data.
  • momojo 1 day ago
    Great work!

    > In practice, that means more logic fits in context, and sessions stretch longer before hitting limits. The AI maintains a broader view of your codebase throughout.

    This is one of those 'intuitions' that I've also had. However, I haven't found any convincing evidence for or against it so far.

    In a similar vein, this is why `reflex`[0] intrigues me. IMO their value prop is "LLM's love Python, so let's write entire apps in python". But again, I haven't seen any hard numbers.

    Anyone seen any hard numbers to back this?

    [0] https://github.com/reflex-dev/reflex

  • omneity 1 day ago
    Do you have any evals on how good LLMs are at generating Glyphlang?

    I’m curious if you optimized for the ability to generate functioning code or just tokenization compression rate, which LLMs you tokenized for, and what was your optimization process like.

  • 29athrowaway 1 day ago
    This could be an IR rather than a high level language.
  • DonHopkins 1 day ago
    Instead of making up new languages, just clean up code in old programming languages so it doesn't smell so bad! ;)

    Sniffable Python: useful for Anthropic skill sister scripts, and in general.

    https://github.com/SimHacker/moollm/tree/main/skills/sniffab...

  • DonHopkins 1 day ago
    What about the cost of the millions of tokens you have to spend to prompt the LLM to understand your bespoke language with manuals and tutorials and examples and stack overflow discussions and the source code to the compiler, added to every single prompt, that it totally forgets after each iteration?

    It already knows python and javascript and markdown and yaml extremely well, so it requires zero tokens to teach it those languages, and doesn't need to be completely taught a new language it's never seen before from the ground up each prompt.

    You are treating token count as the only bottleneck, rather than comprehension fidelity.

    Context window management is a real problem, and designing for generation is a good instinct, but you need to design for what LLMs are already good at, not design a new syntax they have to learn.

    jaggederest's opposite approach (full English words, locque) is actually more aligned with how LLMs work -- they're trained on English and understand English-like constructs deeply.

    noosphr's comment is devastating: "Short symbols cause collisions with other tokens in the LLMs vocabulary." The @ in @ GET /users/:id activates Python decorator associations, shell patterns, email patterns, and more. The semantic noise may outweigh the token savings.

    Perl's obsessive fetish for compact syntax, sigils, punctuation, performative TMTOWTDI one-liners, to the point of looking like line noise, is why it's so terribly designed and no longer relevant or interesting for LLM comprehension and generation.

    I think the ideal syntax for LLM language understanding and generation are markdown and yaml, with some python, javascript, and preferably typescript thrown in.

    As much as I have always preferred json to yaml, it is inarguably better for LLMs. It beats json for llms because it avoids entropy collapse, has less syntax, leaves more tokens and energy for solving problems instead of parsing and generating syntax! Plus, it has comments, which are a game changer for comprehension, in both directions.

    https://x.com/__sunil_kumar_/status/1916926342882594948

    >sunil kumar: Changing my model's tool calling interface from JSON to YAML had surprising side effects.

    >Entropy collapse is one of the biggest issues with GRPO. I've learned that small changes to one's environment can have massive impacts on performance. Surprisingly, changing from JSON to YAML massively improved generation entropy stability, yielding much stronger performance.

    >Forcing a small model to generate properly structured JSON massively constrains the model's ability to search and reason.

    YAML Jazz:

    https://github.com/SimHacker/moollm/blob/main/skills/yaml-ja...

    YAML Jazz: Why Comments Beat Compression

    The GlyphLang approach treats token count as THE bottleneck. Wrong. Comprehension fidelity is the bottleneck.

    The LLM already knows YAML from training. Zero tokens to teach it. Your novel syntax costs millions of tokens per context window in docs, examples, and corrections.

    Why YAML beats JSON for LLMs:

    Sunil Kumar (Groundlight AI) switched from JSON to YAML for tool calling and found it "massively improved generation entropy stability."

      "Forcing a small model to generate properly structured JSON 
       massively constrains the model's ability to search and reason."
    
    JSON pain:

      Strict bracket matching {}[]
      Mandatory commas everywhere  
      Quote escaping \"
      NO COMMENTS ALLOWED
      Rigid syntax = entropy collapse
    
    YAML wins:

      Indentation IS structure
      Minimal delimiters
      Comments preserved
      Flexible = entropy preserved
    
    The killer feature: comments are data.

      timeout: 30  # generous because API is flaky on Mondays
      retries: 3   # based on observed failure patterns
    
    The LLM reads those comments. Acts on them. JSON strips this context entirely.

    On symbol collision: noosphr nails it. Short symbols like @ activate Python decorators, shell patterns, email patterns simultaneously. The semantic noise may exceed the token savings.

    Perl's syntax fetish is why it's irrelevant for LLM generation. Dense punctuation is anti-optimized for how transformers tokenize and reason.

    The ideal LLM syntax: markdown, yaml, typescript. Languages it already knows cold.

  • rubyn00bie 1 day ago
    I think there’s a certain amount of novelty to this, and the aesthetic of the language I find pleasing, but I’m a little confused… Admittedly, I didn’t read the entire doc and only quickly glanced at the source… But is it just transpiling Golang code to and from this syntax, or is it intended to be a whole language eventually? Can folks able to just import golang packages or do they have to only use what packages are currently supported?

    Additionally I have two thoughts about it:

    1. I think this might be more practical as a transparent layer so users can write and get Golang (or whatever) the original language was back. Essentially making it something only the model reads/outputs.

    2.) Longer term it seems like both NVidia and AMD along with the companies training/running the models are focused on driving down cost per token because it’s just too damn high. And I personally don’t see a world where AI becomes pervasive without a huge drop in cost token— it’s not sustainable for companies running the models and end users really can’t afford the real costs as they are today. My point being, will this even be necessary in a 12-18 months?

    I could totally be missing things or lacking the vision of where this could go but I personally would worry that anything written with this has a very short shelf life.

    That’s not to say it’s not useful in the meantime, or not a cool project, more so if there is a longer term vision for it, I think it would be worth calling out.

    • goose0004 8 hours ago
      GLyphLang is intended to be a whole standalone language. It's implemented in Go, but it doesn't transpile to or from it. It has its own lexer, parser, type checker, bytecode compiler, and stack-based VM. If it helps, the compilation pipeline currently looks like this:

      source (.glyph) -> AST -> bytecode (.glyphc) -> VM.

      While the original intent was to have something tailored to AI that a human could manage, I'm realizing (to your point) that will absolutely not be necessary sometime in the likely near future. I've started working on making GlyphLang itself significantly more token-friendly and am adding a top layer that will essentially do what I think you've suggested. I'm adding expand and compact commands for bidirectional conversion between symbols and keywords that will allow engineers to continue developing with more familiar syntaxes on a top layer (.glyphx), while LLMs will generate actual .glyph code. Once completed, the pipeline will look like this:

      .glyphx (optional) -> .glyph -> AST -> bytecode -> VM

      Regarding #2, that's a great point and actually something I considered, though admittedly maybe not long enough. Regardless, I've tried to develop this with a value proposition that isn't purely about cost (though that does drive a lot of this). I'm also working on these 3 points: 1. Reduced hallucinations: symbols are unambiguous - there shouldn't be confusion between def/fn/func/function across languages (no formal benchmarks yet, but they're planned) 2. Context window efficiency: fitting more code in context allows for better reasoning about larger codebases, regardless of cost 3. Language-neutrality (someone else brought this up): symbols work the same whether the model was trained on English, Spanish, or code

      I think even if tokens become free tomorrow, fitting 2x more code in a context window will still significantly improve output quality. Hopefully it will be necessary or at the very least helpful in the next 12-18 months, but who knows. I really appreciate the questions, comments, and callout!