Page 1 of 2

Kodify - New Syntax Highlighter

Posted: Sat Jan 10, 2009 3:38 am
by Chris Corbyn
Addendum: I've now added bracket pairing... visible in the demo

Just thought I'd share something I've been working on and off for a while, that's now finished :)

It's a syntax highlighter with a difference. I can't stand the markup that syntax highlighters generate (bloated and non-semantic). What my version, Kodify does is operates on the client side as a simple progressive enhancement using JavaScript.

If you have JS turned off then you see the source code just fine. If you have JS turned on then you get a colorful version of the source code. Simple.

The other thing the Kodify does differently is that it fully lexically scan the code. I mean, it doesn't just use a big regex which is very slow and limiting... instead it uses a lexical analzyer routine based on C's lex.

It's finished in the sense that the engine and the lexical analyzer (another project of mine) is built... it just needs a whole heap of language specifications adding (community effort would be nice, since I don't know all languages!).

I just threw together the JS language specification to show off what it does.

I haven't optimized it heavily (yet) but it's still reallly fast due to the lexical analysis routine it uses (say 11,000 bytes of source in under 100ms).

It binds to code blocks identified with the "kodify" class name along with the language (e.g. <pre class="kodify js">).

I will make it do a generic highlight (strings and comments) for unspecified languages.

I've tested this on the following browsers:
  • Internet Explorer 6.0 (I'd like to try IE 7 and 8 but don't have access to them)
  • Opera 9.6
  • Safari 3.0
  • iPhone
  • Firefox 3.0
Example output of this code:
http://w3style.co.uk/~d11wtq/kodify/demo/

Code: Select all

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xml:lang="en">
  <head>
    <title>Kodify Demo</title>
    <link rel="stylesheet" type="text/css" href="../themes/blackboard.css" />
    <script type="text/javascript" src="../js/lx_analyzer.js"></script>
    <script type="text/javascript" src="../js/kodify.js"></script>
    <script type="text/javascript" src="../js/lang/js.js"></script>
  </head>
  <body>
    <div class="intro">
      <h1>JavaScript Source Code</h1>
      <p>
        View this page with JavaScript enabled, then try it with JavaScript turned off.
      </p>
      <p>
        View the HTML source and see how clear and semantic it is.
      </p>
    </div>
    
    <div class="example">
      <h2>JavaScript</h2>
      <code>
        <pre class="kodify js">
/**
 * This is a comment.
 */
var ClassA = function ClassA(argName) {
  this.publicProperty = argName;
  
  /** @private */
  var _privateVar = 42;
  
  this.methodName = function methodName(a, b, c) {
    return window.confirm(a + b + c);
  };
  
};
 
ClassA.prototype.otherMethod = function otherMethod() {
  this.publicProperty = 0xFF;
};
 
//Strings work fine and dandy
var regex = new RegExp("Word\\s+\"moon\"");
 
//RegExp literals are detected
var regexLiteral = /Word\s+"moon"/;
 
//The / c / part of this is not detected as a regex
var x = a + b / c / d * 9;
 
#Single line comments work
doSomething(/regex here/);
 
        </pre>
      </code>
    </div>
  </body>
</html>
 
Anybody likely to use this once I add support for lots of other languages and create new themes?

Things is definitely WILL add:
  • As many languages as I can get (I'll ask others to write the specs)
  • Heaps of themes
  • Support for bracket pairing (hover on a bracket to see the matching one)
  • Support for non-obtrusive line numbering, so you can copy & paste without the line numbers
  • Support for embedded languages (such as PHP/HTML, HTML/JavaScript, HTML/CSS)

Re: Kodify - New Syntax Highlighter

Posted: Sat Jan 10, 2009 7:14 am
by Chris Corbyn
Just added PHP support (visible in the demo). Too easy! :)

Re: Kodify - New Syntax Highlighter

Posted: Sat Jan 10, 2009 10:40 am
by josh
Very nice, if you ran it on it's own source code would the space time continuum be corrupted?

Re: Kodify - New Syntax Highlighter

Posted: Sat Jan 10, 2009 2:24 pm
by Chris Corbyn
jshpro2 wrote:Very nice, if you ran it on it's own source code would the space time continuum be corrupted?
I've had so many near misses with raptors I'm not prepared to attempt it again 8O

No, actually I've linked to it highlighting its own source code in the Coding Critique forum.

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 4:40 pm
by panic!
great work, so impressed mate!

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 7:24 pm
by Chris Corbyn
I've registered kodify.org and will get something more complete up there soon :)

To be honest I need contributors who can write language specs for the languages I don't know (and people who have an eye for good color schemes). The code is not quite ready for that yet but I'll ask when I need people :)

To come:

Code collapse (easy since I already pair up brackets, though collapsing XML/HTML is a slightly different ballgame)
Line Numbering

I'm also very curious if I could integrate (as a plugin) with TinyMCE/FCKEditor so that they act a little bit like an IDE for writing code in forums and stuff.

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 7:31 pm
by alex.barylski
I'm also very curious if I could integrate (as a plugin) with TinyMCE/FCKEditor so that they act a little bit like an IDE for writing code in forums and stuff.
That is what I was just about to suggest.

I played around with a similar idea years back...first using regex...which turned out to be extremely slow as the regex was executed each time a key was pressed.

Then I considered implementing a caret tracker, so only regex was invoked when the changes were applied outside of already colorized tokens. For instance, when editing in a string which is already colored (say red) there is no need to run the regex.

To further optimize, if you could determine what text was not in the current viewport, you could avoid regex'ing all non-visible text.

I'm not sure how fast something like that would be, but I see IDE's eventually being web based -- at least for PHP based projects.

That was actually the intent behind my TexoCMS (http://www.sourceforge.net/projects/texocms). I wanted something like a CMS and eventually an IDE so I could build a web site using templates and manage any code changes within the browser itself.

A while back someone posted a JS project which actually did something like this...but of course I cannot find it now. :P

Cheers,
Alex

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 9:11 pm
by Kieran Huggins
This is totally rad.

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 9:21 pm
by josh
I was actually thinkin about that... it would be cool to be able to let clients update their templates in a javascript powered IDE, not even necessarily WYSIWYG integrated.. you could make it do smarty / whatever... A while ago I made an editor for CSS, it was in PHP and didn't use AJAX but it used dropdowns for valid attributes instead of letting/making the user type

Re: Kodify - New Syntax Highlighter

Posted: Sun Jan 11, 2009 10:16 pm
by Chris Corbyn
I'm fairly sure this would be possible to do. It's probably simpler than a RTE since it doesn't have to generate HTML, it would be faulty if it did generate HTML ;) The HTML view is purely in memory at the DOM.

The way editors do the lexing so quickly (AFAIK) is that they only operate on X lines surrounding what you're editting (and no further than the viewport). If changes don't propogate further than that then it's all good, otherwise the lexical analyzer can move to the next block of code and decide if that needs updating.

Knowing how TextMate works I'm not sure how many editors use proper lexical analysis though... I'm fairly sure they just do crazy regex work.

The lexical analysis routines in Kodify are "programmed" in JavaScript, wrapping what is essentially a framework for stack-based lexical analysis.

For example, to match a double quoted string (API subtly different to the public version here, but algorithm the same):

Code: Select all

//At top with other config settings
Lx.state("DOUBLE_STRING");
 
//Switch states when a " is seen, so now we only find tokens in the DOUBLE_STRING state
Kodify.rule('"', Lx.INITIAL).peforms(function() {
  Kodify.matchedToken().class("string").append();
  Lx.PushState(Lx.DOUBLE_STRING);
});
 
//Copy all string contents, only allowing escaped double quotes
Kodify.rule(/(?:\\?[^"\\]|\\\\|\\")+/, Lx.DOUBLE_STRING).performs(function() {
  Kodify.matchedToken().class("string").append();
});
 
//Go back to the previous state (pop the current state of the state stack) when the next " is hit
Kodify.rule('"', Lx.DOUBLE_STRING).peforms(function() {
  Kodify.matchedToken().class("string").append();
  Lx.PopState();
});
 
Since a lot of this is "boilerplate" code that will be present in almost all language declarations I'll provide wrappers in either Kodify (the highlighter) or Lx (the lexical analyzer) to do this. I already provide such wrapper for matching things like /* comments */

Code: Select all

Kodify.rule("/*").performs(function() {
  Kodify.continueUntil("*/");
  Kodify.matchedToken().class("comment multiline").append();
});
Effectively the state stack means that you're not wasting cycles looking for tokens that cannot syntactically exist at certain points, and it also means you can distinguish say a function parameter from any other variable.

I've adopted "standard" set of class names with subclasses of those. For example a string must be output with the class name of "string" so that the theme CSS file works. But I have "string literal" and "string heredoc" too so in the theme file ".string" is a catch all for strings of all types with more fine grained rules for ".string.literal" if you want to highlight those differently in your theme. Same goes for comments, variables and other types.

I'm quite excited about the possibilities :)

Re: Kodify - New Syntax Highlighter

Posted: Mon Jan 12, 2009 5:24 am
by jayshields
Have you checked out EtherPad? It does JavaScript syntax highlighting on-the-fly.

Re: Kodify - New Syntax Highlighter

Posted: Mon Jan 12, 2009 5:27 am
by Chris Corbyn
Hadn't seen that before no. Just had a look and bookmarked it for later reference :)

Re: Kodify - New Syntax Highlighter

Posted: Mon Jan 12, 2009 6:12 am
by papa
Looks very nice Chris Corbyn!

Let me know if I can help with the themes. :)

Re: Kodify - New Syntax Highlighter

Posted: Mon Jan 12, 2009 7:24 am
by Chris Corbyn
papa wrote:Looks very nice Chris Corbyn!

Let me know if I can help with the themes. :)
You can! The more the merrier. Let me finalize things a little more over the next couple of days (no point someone writing themes if things will change halfway through) then I'll be calling for help with themes and with new languages :)

I'm building the website at the moment so I can go public with a handful of languages that I know myself with a note on the website asking for contributors. It'd be really easy to put a WYSIWYG theme creator (written in JS) on the site too.

2-3 days.

Re: Kodify - New Syntax Highlighter

Posted: Fri Apr 10, 2009 1:32 pm
by Luke
Hey Chris! What's the status on this thing? I have been looking for a good code highlighter plugin for wordpress and have not been able to find any decent ones. I think I'm going to turn your kodify into a wordpress plugin, would you mind?

EDIT: I'm also going to build a few themes for it. I would like a theme that looks like the default textmate theme.