Here's an interesting factoid about browser extensions: lots of them are not about extending the browser at all. By my count, about 75% of the this week's
top 20 Firefox extensions are more about extending the web content rendered by the browser than extending the browser itself. Similar trends exist in other browser extension systems.
Chromium extensions will be able to interact with web content too, using a feature we're calling content scripts (we've gone around and around on the name, this may not be final). The code for this is at a pretty good stopping point now, so I wanted to pause and write down what we did, why we did it, and some ideas I have for future improvements.
If you want to try it out, you can check out the beginnings of our
Extension Tutorial, which covers most of what I'll talk about here.
First, some background on the feature...
Content scripts are basically the same thing as Greasemonkey scripts, with some important improvements.
You register your content scripts declaratively in your extension's manifest, like this:
{
"name": "My first extension",
"description": "The first extension that I made",
"version": "1.0",
"content_scripts": [
{
"matches": ["http://www.google.com/*", "http://mail.google.com/"],
"css": ["foo.css", "bar.css"],
"js": ["hot.js", "dog.js"],
"run_at": "document_start"
}
]
}The syntax for
matching URLs is slightly different than in Greasemonkey. The reason for this is that we wanted to eliminate a common bug in Greasemonkey scripts, where people accidentally match URLs more loosely than they intend. A classic example is the common Greasemonkey pattern
@include *.google.com*, which matches
every domain, not just google.com and its subdomains.
The matching syntax used in content scripts separates the domain portion of the pattern from the path portion, making it more explicit which sites a script will run on. One way we could use this is to someday do UI like this:
==============================================
Install 'My extension'?
----------------------------------------------
This extension will be able to interact with
web pages on:
www.google.com
mail.google.com
[ok] [cancel]
==============================================
Other minor feature differences:
- A content script can consist of multiple physical JavaScript files or CSS files, and it can also reference images or other resources included in the extension by URL.
- Content scripts support "early injection", which allows them to request being injected before any nodes have been added to the document by using the optional "run_at" key.
Execution Environment
To understand the execution environment for content scripts, it helps to first understand the execution environment of normal web page JavaScript.
All JavaScript is defined in a context. Each DOM window gets its own context, one purpose of which is to hold the prototypes of all the global objects (Object, Array, String, and so-on). This is why when you extend Array.prototype in one frame, it doesn't affect Arrays created in other frames.
Importantly, you can call functions and access objects across contexts. This happens normally when you do something like window.frames['otherframe'].someFunction().
Here's a diagram that explains the relationship between the various objects in pretty picture form (thanks,
Gliffy!):
Each context also has a single global object. When you access global variables in a JavaScript program, you are really interacting with the properties of this global object. In HTML, the global object is of course the Window object.
To make property hiding work, in Chromium's implementation, the global object is not actually the same JavaScript object that represents ("wraps") the C++ DOMWindow. There is actually a separate JavaScript object whose __proto__ points to that object. When you define global variables, it is this object where the properties are actually defined.
Ok, so how do content scripts fit into this?
Content scripts run in a very similar-looking environment. They run in a separate context, and have a separate global object. But that global object's __proto__ points at the same JS object that represents the Window.

So content scripts get their own global scope and their own set of prototypes. Variables defined in the web page won't be "visible" by default in content scripts, and the same is true in reverse. Other than that, the environment for content scripts is exactly the same as for normal JavaScript running in web pages. Writing content scripts should be exactly the same as writing JavaScript for web pages.
Sometimes it is useful to access the page's global variables. For example, in Gmail there is
an API that allows Greasemonkey scripts to drive some parts of the UI. To allow this kind of functionality, the content script envionment has a special
contentWindow global variable defined that can be used to access the global scope of the page's JavaScript.
Permissions
Another difference from Greasemonkey is the model for accessing privileged APIs. Greasemonkey scripts have direct access to some privileged APIs. The most popular of these is GM_xmlhttpRequest, which provides access to origins other than the one for the current document. These APIs are very useful, but there have been bugs where they leaked into web content, which was bad.
In order to prevent this from being possible, Chromium extensions are split into two main pieces: a privileged part (I'll call it just 'the extension' from now on) that has access to special powerful APIs, and an unprivileged part (the content script) that runs in the renderer and has no special APIs.
The two parts cannot interact directly. In fact, they run in separate OS processes, so direct interaction is impossible. The only way they can communicate is via message passing APIs, similar to postMessage().

(NOTE: The implementation of content script messaging is still in progress and is incomplete in current trunk and dev builds)
It is the extension developer's responsibility to send only specific messages to the extension process from the renderer, and to validate those messages carefully. Extension developers need to be aware that malicious web pages could send them messages exactly the same way their content scripts can.
This design is modeled after the way Chromium itself works, where the renderers are untrusted and have to send messages to the browser process to get interesting work done.
Future Directions
I have a couple ideas for where I'd like to take this next...
Idea 1: Completely separate content scripts and page JavaScript
Right now, the way that JavaScript access to the DOM is implemented, there is essentially a global table of JavaScript wrappers for each C++ DOM object. Whenever code needs to find the JS object for a given C++ object, it consults this table:
This single table creates a bridge between any two JavaScript contexts that have access to the same DOM nodes. For example if page JavaScript does something like document.body.onclick = function() { ... }, any other code that has access to document.body will also have access to the onclick function handler that the page JavaScript defined .
This makes sense for web pages, where you want frames in the same origin to see the same sets of JavaScript variables. But for content scripts, it would be nice to wall these two worlds off from each other. It is relatively infrequent for content scripts to need to see the JavaScript enironment fo pages. It is more typical to only need access to the DOM.
In order to isolate content scripts from page JavaScript, we'd have to have separate mapping tables: one for the page JavaScript, and one for each content script. A C++ DOM node could have multiple wrappers, one for each of these "worlds". Then, when we needed to get a JavaScript object for a particular C++ object, we'd decide which table to look in based on which context the calling code was running in. Every context could only be in one "world".

We could even add assertions to the JavaScript engine that worlds are never bridged. That way if we ever had a bug, in the worst case we'd crash the renderer, not have a security problem.
If we can wall these worlds off from each other, then we can offer some increased privileges to content scripts directly, because we'd be confident that they couldn't leak to web content. You'd no longer have to go to the extension process to get cross-origin XHR, for example. This would also have the advantage of not requiring extension developers to carefully validate their messages, since we would know that page JavaScript could not send extensions messages.
We'd still probably need content scripts as they exist today if you want to interact with the JS defined by the page (for example for the Gmail API). But lots of use cases don't need that, and this idea would decrease complexity for those cases.
Idea 2: DOM Access from Extension Processes
Another idea is to offer some form of DOM access directly to extension processes. There is a team in Chromium working on an out-of-process version of the web inspector. This will clearly need some form of DOM access to work, so we can probably reuse what they do to give extension developers the ability interact with page DOM directly from their extension process.
I can imagine something simple based on querySelectorAll(). You ask for some nodes based on a CSS expression, get back a snapshot, and then send some updates. Of course, there are problems with races: the nodes might be gone by the time you send the update. But I think in most cases this would work pretty nicely I think. Again, I think we'd want to keep content scripts as they are today for more complex needs.
Yawn... Greasemonkey is great, but when do we get real extensions?
I know, I know. These aren't "real" extensions. You want to know when you'll be able to put things in the Chrome UI. Good news: that is
well underway. Hopefully my next blog post will be about how to add "toolstrips" to Chromium.
Until then, have a look at content scripts and let us know what you think.