Beyond iOS 26 and WebGPU: How We Use FFmpeg, WebGL, and Shaders to Power BrandLens Video

ffmpeg,threejs, brandlens
Share
Subscribe to our Blog
Blog Subscribe form

When I wrote about iOS 26 and WebGPU, I zoomed in on the platform-level shift Apple made — and why it matters for video-heavy apps. But what I didn’t talk about was how we actually build BrandLens under the hood today. Because making a video editor run seamlessly in a mobile browser — with no app download — takes a lot more than clever UI. It’s equal parts WebGL, FFmpeg, hardware encoding, and, yes, a fair amount of shader code.

So let’s get into the details of how it works.


Rendering in the Browser with WebGL + GLSL

Inside the BrandLens editor, everything you see on screen — video previews, filters, overlays, even timeline scrubbing — is rendered with WebGL. We use Three.js as our rendering engine (via a React Three.js library) to manage the scene and UI components. The heavy lifting happens inside GLSL shaders, which let us process video frames directly on the GPU in real time. Key tasks we handle with fragment shaders include:

  • Color Conversion: Camera feeds and video decoders often provide frames in YUV color formats, while WebGL works best with RGBA. Our shaders convert YUV to RGB on the fly so that the colors render correctly in the browser.
  • Alpha Blending: For transparent overlays like logos, stickers, or text, we compute per-pixel alpha blending in the shader. This ensures these elements composite in real time over the video, making brand assets feel native to the scene rather than post-processed.
  • Live Filters: Adjustments like brightness/contrast, color LUTs, and other creative effects are applied via fragment shader logic. Users can see the exact result immediately as they record or edit, with the GPU applying effects at 60fps instead of waiting for a slow render.

This GPU-driven approach means creators can shoot, edit, and preview complex video effects without waiting on server processing. Even mid-range mobile devices can handle these real-time previews because the work is offloaded to the GPU. In short, WebGL + shaders give us a smooth interactive editing experience entirely in the browser.

From Browser to Server: FFmpeg + NVENC for Final Processing

Once a user finishes recording or editing their video in the browser, we ship the raw footage plus all the editing metadata (filters applied, overlay positions, etc.) to our servers. That’s where FFmpeg takes over to produce the final video. We essentially replicate the same effects and composition server-side to render a polished MP4 that’s ready to share. Here’s what happens on the backend:

  • Layering and Effects: Using FFmpeg’s filter pipeline, we reapply all the filters, text, and overlay layers that the user saw in the live preview. This ensures a WYSIWYG result – the final video looks identical to what the user saw in the editor.
  • Concatenation: If the user recorded multiple clips or has separate audio tracks, FFmpeg concatenates all the segments and syncs the audio into a single timeline. The result is one continuous MP4 video.
  • GPU-Accelerated Encoding: For speed, we leverage NVIDIA’s hardware encoder (NVENC) when rendering the final video. By offloading encoding to the GPU, we dramatically cut down processing time. In practice, using FFmpeg’s NVENC integration (e.g. the h264_nvenc codec) lets us achieve significantly faster encodes than CPU-only encoding, which means the user gets the finished video back quicker. (On modern NVIDIA cards, hardware encoding can be an order of magnitude faster than libx264 software encoding in many cases.)
  • CPU-Based Processing for Bulk Tasks: Not every job needs real-time speed. For large batch jobs or non-time-critical processing (like nightly re-processing of uploaded assets), we also use FFmpeg on CPU. The CPU encoders (like libx264/libx265) can squeeze out more compression efficiency or quality at the cost of time. We reserve these for cases where throughput matters more than latency.

By balancing GPU acceleration and CPU encoding, we optimize for both speed and quality as needed. A marketing manager generating a single video will get it back in seconds thanks to NVENC, while a batch export of 100 videos can run on CPU at high quality in the background. FFmpeg gives us the flexibility to tailor the pipeline per use case.

Alpha Video: Compatibility Meets Performance

One of the trickier challenges we tackled was supporting videos with transparency (alpha channel) for overlays and transitions. Traditional video formats don’t handle alpha well – for instance, H.264 doesn’t officially support an alpha channel in any broadly compatible way. Demanding that users use special formats (like QuickTime animation, WebM VP9 with alpha, or huge PNG sequences) wasn’t an option for a web platform. Our solution: encode alpha into a standard video stream in a clever way and use WebGL to decode it on the fly.

Here’s how it works: we pack the RGBA video into a single frame by stacking the color data over the alpha data. In other words, we double the video’s height – the top half of each frame is the RGB video, and the bottom half is a grayscale image representing the alpha channe. This “stacked” video is encoded as a normal H.264 MP4, so any device can play it (it just sees it as a tall video). Then in the BrandLens editor’s WebGL layer, a fragment shader recombines the halves: it reads the top-half pixel for color and the corresponding bottom-half pixel for alpha, and merges them into a final RGBA output. Essentially, the shader unpacks the transparency at playback time.

This approach gives us the best of both worlds: maximum compatibility (it’s just a standard MP4 video under the hood) and real-time compositing via GPU. FFmpeg makes it straightforward to create these stacked-alpha videos. For example, we can take an input with an alpha channel and run it through a filter graph that extracts the alpha plane and stacks it below the color plane. The result is a double-height video where the alpha is baked in visually. We then encode that using H.264 for efficiency. The file sizes stay reasonable because the alpha plane, especially for graphics like logos or overlays, tends to compress very well (often it’s mostly blank or simple shapes, which H.264 encodes at a very low cost). In one demo, this technique added only a few kilobytes to the video for the alpha data, which is a negligible trade-off for getting full transparency support. Check out a great explanation at jakearchibald.com/2024/video-with-transparency/

On the WebGL side, using a shader for this means the user’s device does the alpha compositing work at playback time. But GPUs are so good at this that even an iPhone can easily composite 60+ FPS with transparency. We avoid any expensive CPU pixel manipulation, so the playback remains smooth. Users just see their animated stickers or overlays seamlessly appear over video. They never have to know that under the hood we’re doing a little magic with how the video is encoded.

In case you’re wondering, why not just use formats that do support alpha, like WebM VP9 or HEVC with alpha? The answer is compatibility and performance. VP9 with alpha is great in Chrome desktop, but wasn’t supported on iOS at all for a long time. HEVC with alpha is an Apple-only extension and can’t be encoded on non-Apple hardware easily. Our stacked H.264 approach works everywhere and we can encode it on any server with FFmpeg. It’s a little unorthodox, but it’s very reliable.

Why WebGPU Changes the Game (Future Outlook)

Right now, our stack of WebGL + FFmpeg is pushing the limits of what’s possible in a mobile browser. But we’re keeping a close eye on WebGPU, which is rolling out across modern browsers (including Safari in iOS 26) as the next-gen web graphics API. WebGPU will eventually let us take this to a whole new level. A few big opportunities we see:

  • Compute Shaders on the Client: Unlike WebGL, WebGPU isn’t limited to just drawing pixels. We’ll be able to run general-purpose compute shaders on video frames in the browser. That means more complex effects (e.g. multi-pass filters, motion tracking, even some AI-driven effects) could be done right on the user’s device. Imagine applying a full LUT or doing real-time background removal in the browser without a round-trip to the server.
  • Lower Overhead, More Power: WebGPU is a more modern API that talks more directly to the GPU (Metal on iOS, DirectX12/Vulkan on others). The overhead per draw or per operation is lower, which translates to smoother playback and lower battery usage for the same work. In practice, this could let mid-range phones handle what only high-end phones can do now.
  • Better Cross-Platform Consistency: WebGPU comes with a new shading language (WGSL) and a more explicit GPU feature model. We expect fewer cross-browser quirks than we have with WebGL/GLSL. In theory, if it works in one WebGPU browser, it should work everywhere. This will save us time spent on device-specific bug fixes and let us focus on new features instead.
  • New Possibilities: We’re especially excited about combining WebGPU with the emerging WebCodecs and WebAssembly video tools. For example, decoding video frames to GPU memory, processing them with WebGPU, and even encoding video segments directly in the browser could open up true client-side rendering pipelines. It’s not far-fetched – frameworks are already experimenting with declarative video generation using React + WebGPU for timeline rendering.

In short, WebGPU is going to let us shift more of the video processing load to the client in a safe way, which both reduces server costs and gives users instant feedback. We’re actively prototyping with it. While WebGL got us amazingly far, we see WebGPU as the next leap for making the web feel as responsive and capable as native when it comes to video. (If you’re curious about the broader implications of WebGPU, check out my previous iOS 26 deep-dive for more on why it’s such a big deal.)

Others in the Game: Remotion, Shotstack, and More

We’re not the only ones exploring the frontier of video rendering with web tech – far from it. It’s worth mentioning a few other projects and companies tackling similar challenges in their own ways:

  • Remotion (Open Source): Remotion is a popular library that lets developers make videos programmatically using React components. Essentially, you write React code (leveraging HTML, CSS, SVG, even WebGL via Three.js) to define scenes, and Remotion renders them to real MP4 videos. It uses headless Chrome and FFmpeg under the hood to capture your React app frame-by-frame into a video. The cool part is that it opens up video creation to web developers – if you know React, you can produce dynamic videos with data, animations, etc. We share a philosophy with Remotion: using web technologies for video rendering. However, Remotion is more for programmatically generating videos (e.g. an automated “Year in Review” montage), whereas BrandLens is about interactive, live user editing. In fact, Remotion even supports Three.js scenes in React, similar to how our editor uses React + Three.js for the UI. It’s exciting to see frameworks like this because it validates that web tech is powerful enough for serious video work.
  • Shotstack (Cloud API): Shotstack takes a slightly different approach – it’s a cloud video editing API aimed at developers who want to generate or edit videos via a web service. You can think of it as a high-level wrapper around FFmpeg in the cloud. Instead of writing FFmpeg commands, you send Shotstack a JSON describing your video timeline (clips, transitions, text, etc.), and their cloud renders the video for you. They pitch themselves as an “FFmpeg alternative” for those who don’t want to maintain their own video rendering servers. The interesting connection here is that Shotstack also highlights how heavy and complex running FFmpeg at scale can be, and their solution is to handle it for you in the cloud. In our case, we built our own pipeline, but the very existence of Shotstack shows how much demand there is for easier video processing backends. It’s a more back-end oriented cousin to what we do – whereas we focus on a real-time user-facing editor in the browser, Shotstack is more about automating video creation on servers. Both approaches use GPUs and smart encoding to speed things up.
  • FFmpeg/WASM and WebCodecs projects: There’s also interesting work in bringing parts of video processing directly into the browser via WebAssembly. Projects like FFmpeg.wasm essentially compile FFmpeg to run in browser (WASM), and when combined with the emerging WebCodecs API (for efficient frame decoding/encoding in JS), developers can do things like apply filters to an existing video in real-time on the client. This is adjacent to our approach – we chose WebGL for live processing rather than software filters via WASM – but it’s another sign that the gap between “what you can do on a client vs. a server” for video is closing. As WebGPU and WebCodecs mature, expect to see even more hybrid approaches where the browser does heavy lifting and cloud is just for storage or final assembly.

All of these examples confirm what we’ve been betting on: GPUs in the browser are no longer a toy – they’re becoming the backbone of modern interactive media apps. Whether it’s a React developer generating videos with code, or an API stitching clips in the cloud, or our in-browser editor mixing shaders and FFmpeg, the industry is converging on this idea of treating video as a dynamic, programmable medium.

What It All Means for Users (and the Future of Video Creation)

At the end of the day, we pour all this engineering effort into BrandLens for a simple result: a seamless experience for the creator and the end viewer. A campaign manager or an everyday user doesn’t need to know that we’re juggling YUV color conversion in a shader, or encoding video with NVENC on a GPU server, or packing alpha channels into H.264. They just know that they can tap a link or scan a QR code, record a video with fun overlays and filters, and get a polished clip ready to share – all without installing an app or waiting forever for processing. The technology vanishes into the background.

This is why developments like WebGPU excite us: every advancement at the platform level translates into less friction for the user. If we can do more on the client GPU, the user sees results instantly and we rely less on bandwidth or cloud compute. If encoding gets faster, the user waits less to get their video. If our pipeline is more efficient, we can scale up campaigns for our clients without compromising on quality or speed.

In a world where authentic video content is king, lowering the barriers to create that content is a big deal. Our mission is to take the kind of tech Hollywood has (fancy editors, powerful render farms, custom effects) and put it in the hands of everyday people through a web browser. That’s a tall order, but it’s becoming more achievable every year with these advances. BrandLens is a product of this convergence: web tech, GPU power, and creative design coming together.

BrandLens
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.