A streaming HTML parser for Elixir built on top of CloudFlare's LOL HTML.
Unlike traditional DOM-based parsers (like Floki), Laughter processes HTML as it streams in, making it ideal for:
- Crawlers - Extract links as the page downloads, not after
- Large documents - Constant memory usage regardless of document size
- Real-time processing - Get results before the full document arrives
- 🚀 Streaming - Process HTML chunk by chunk
- 🎯 CSS Selectors - Filter elements with familiar CSS syntax
- 💾 Memory bounded - Configurable memory limits
- 🔒 Thread-safe - Safe for concurrent use
- ⚡ Fast - Built on Rust's lol-html (used by Cloudflare Workers)
def deps do
[
{:laughter, "~> 0.2.0", github: "abiko-search/laughter", submodules: true}
]
endRequirements:
- Elixir ~> 1.15
- Rust (for compilation)
# Create a parser builder
builder = Laughter.build()
# Register CSS selectors - matched elements are sent as messages
link_ref = Laughter.filter(builder, self(), "a[href]")
# Create the parser
parser = Laughter.create(builder)
# Stream HTML in chunks (simulating network data)
parser
|> Laughter.parse("<html><body>")
|> Laughter.parse("<a href='/page1'>Link 1</a>")
|> Laughter.parse("<a href='/page2'>Link 2</a>")
|> Laughter.parse("</body></html>")
|> Laughter.done()
# Receive matched elements
receive do
{:element, ^link_ref, {"a", [{"href", "/page1"}]}} -> :ok
end
receive do
{:element, ^link_ref, {"a", [{"href", "/page2"}]}} -> :ok
endbuilder = Laughter.build()
# Pass `true` as 4th argument to receive text content
title_ref = Laughter.filter(builder, self(), "title", true)
builder
|> Laughter.create()
|> Laughter.parse("<html><head><title>My Page</title></head></html>")
|> Laughter.done()
receive do
{:element, ^title_ref, {"title", []}} -> :ok
end
receive do
{:text, ^title_ref, "My Page"} -> :ok
endbuilder = Laughter.build()
links = Laughter.filter(builder, self(), "a")
images = Laughter.filter(builder, self(), "img")
meta = Laughter.filter(builder, self(), "meta[name='description']")
# All selectors work on the same stream
builder
|> Laughter.create()
|> Laughter.parse(html)
|> Laughter.done()# Limit memory usage (bytes)
parser = Laughter.create(builder, max_memory: 16_384)
# Raises if limit exceeded
Laughter.parse(parser, very_large_html)# Specify character encoding
parser = Laughter.create(builder, encoding: "utf-8")Matched elements are sent as messages to the registered process:
# Element matched
{:element, reference, {tag_name, attributes}}
# Text content (when send_content: true)
{:text, reference, binary}
# Document end
{:end, reference}Laughter supports standard CSS selectors:
- Tag:
div,a,span - Class:
.content,div.main - ID:
#header - Attribute:
[href],[rel="nofollow"] - Combinators:
div > a,ul li,h1 + p - Pseudo-classes:
:nth-child(2),:first-child
Laughter processes HTML in a single pass without building a DOM tree:
| Parser | Memory (1MB HTML) | Time |
|---|---|---|
| Floki | ~10MB | ~50ms |
| Laughter | ~16KB (constant) | ~20ms |
- lol-html - CloudFlare's streaming HTML rewriter