Video Editor

The Video Editor: a timeline that thinks

The Video Editor is Mass's timeline-based editor — a real multi-track studio in your browser. Layer video clips, text, images, shapes, audio, and stickers on a frame-accurate timeline; add word-level animated captions; generate footage, voiceover, and avatars with AI; and render the finished cut at high quality. It's where short-form ads, VSLs, and social cuts get made.

20 min read · The complete Video Editor guide

What the Video Editor is

A frame-accurate, multi-track timeline editor that runs in the browser.

The editor is built around a timeline measured in frames at a fixed frame rate, so every edit is frame-accurate. The interface puts a live player at the top and the timeline below, with a header for project actions and controls for play/pause and seeking. Everything you place on the timeline is an overlay — a typed object with a start frame, a duration, and a track row — and the player composes them all into the running preview.

Because the composition is data, the same project plays in the live preview and renders to a final file from the exact same description. What you see scrubbing the timeline is what you get when you export.

  • Frame-accuratethe timeline is measured in frames at a fixed rate, so every cut lands exactly where you put it.
  • Live playera real-time preview composes every overlay as you edit and scrub.
  • Overlay modeleverything on the timeline is a typed overlay with a start, a duration, and a track row.
  • Preview = renderthe final render is produced from the same composition you previewed.

Tracks, clips & overlays

Layer every kind of media on multi-row tracks and arrange them in time.

The timeline is multi-track: overlays sit on rows so they can stack and play together — a video clip on one row, a caption on another, music on a third. The editor supports the full range of overlay types: video clips, text, images, shapes, sound, captions (including a punchy word-by-word caption style), stickers, and camera regions for multi-camera work.

Clips carry their own in-point so you can show just part of a source video, and overlays are positioned and sized on the canvas as well as in time — so a layout and a sequence are edited together, not in separate tools.

  • Multi-track rowsstack clips, text, captions, and audio on separate rows that play together.
  • Every media typevideo, text, image, shape, sound, captions, stickers, and camera regions.
  • Clip trimming in timeeach clip has its own in-point so you can use just the part you want.
  • Canvas + timelineoverlays are positioned and sized on the canvas as well as placed in time.

Core editing: split, trim, duplicate & undo

The cut-room essentials — every action you expect from a real editor.

The editing operations are the ones you'd reach for in any timeline tool. Split cuts an overlay into two at the playhead, the foundation of any trim or transition; duplicate copies an overlay in place; and delete removes it. Selection drives a contextual toolbar so the controls match whatever you've clicked.

A full history stack backs all of it — undo and redo step through your edits — and a canvas zoom (from 50% to 200%) plus a playback-rate control let you work precisely on a detail or review the whole piece at speed. Local media support means you can bring your own footage into the project directly.

  • Split at the playheadcut any overlay into two for trims, cutaways, and transitions.
  • Duplicate & deletecopy an overlay in place or remove it, with a contextual toolbar per selection.
  • Undo / redoa full history stack steps backward and forward through your edits.
  • Zoom & speedcanvas zoom (50–200%) and a playback-rate control for precise or fast work.

Word-level captions

Animated captions with per-word timing you can edit on a waveform.

Captions are a headline feature. The editor renders captions word by word, highlighting each word as it's spoken, with a punchy social-style option for high-retention short-form. A dedicated caption timeline shows the words as blocks against a waveform, so you can see and adjust exactly when each word appears.

The transcription is fully editable: change a word's text, drag its start and end timing on the waveform (with live feedback while dragging and a save only when you release), insert and delete words, drop in emoji, and mark words for emphasis. Timing validation flags problems, and the caption editor has its own undo/redo — so you can perfect the captions without leaving the tool.

  • Word-by-wordcaptions highlight each word as it's spoken, with a punchy social-style option.
  • Waveform timelineedit per-word timing visually against the audio waveform.
  • Editable transcriptchange text, retime, add or delete words, insert emoji, and emphasize words.
  • Validatedtiming validation catches problems, with dedicated undo/redo for captions.

Multi-camera & vertical preview

Frame multiple camera regions and preview a vertical cut alongside the main edit.

For talking-head and podcast-style content, the editor supports camera regions — overlays that frame a portion of the source so you can cut between angles or crops from a single recording. A vertical preview panel sits beside the main editor, so you can compose and check a portrait (9:16) version at the same time you edit the primary composition.

That makes repurposing a horizontal recording into vertical social cuts part of the same session, rather than a separate re-edit.

  • Camera regionsframe portions of the source to cut between angles or crops from one recording.
  • Vertical previewcompose and check a 9:16 portrait cut alongside the main edit.
  • Repurpose in placeturn a horizontal recording into vertical social cuts in the same session.

AI generation, avatars & voiceover

Generate footage, talking-head avatars, and voiceover without leaving the timeline.

The editor has AI built in. Generate video from a prompt or from a script, with a choice of generation models, and drop the result straight onto the timeline as a clip. Create talking-head avatar clips from a selection of avatars and voices, and apply Ken Burns motion to still images so a static shot earns its place in a moving piece.

An AI assistant rides alongside the editor: it can produce media and hand it to a generic "add to timeline" channel, so anything it makes — a generated clip, a music bed, an image — lands as a real overlay that plays and persists like everything else you placed by hand.

  • Generate footagetext-to-video and script-to-video with a choice of generation models.
  • Avatars & voicescreate talking-head avatar clips from a library of avatars and voices.
  • Ken Burns motionadd pan-and-zoom motion to stills so static shots feel alive.
  • AI assistantthe assistant produces media and drops it onto the timeline as a real overlay.

Rendering & export

Render the finished composition to a high-quality file from the same description you edited.

When the cut is ready, the editor renders the composition to a final video. Rendering runs as a job you can track to completion and then download, with server-side rendering and processing handling the heavy lifting so your browser isn't tied up. The aspect ratio you chose — landscape, square, or vertical — carries through to the output.

Because the render is generated from the same overlay composition you previewed, the exported file matches the timeline exactly: same timing, same captions, same layout. No surprises between preview and final.

  • Job-based renderrendering runs as a trackable job you can monitor and then download.
  • Server-side processingthe heavy lifting happens off your machine so the browser stays responsive.
  • Aspect-aware outputlandscape, square, or vertical — the chosen ratio carries into the export.
  • Faithful exportthe render is produced from the same composition you previewed, frame for frame.

Two ways to make video

The Video Editor is the timeline-level tool for hands-on, frame-accurate cuts. For script-to-VSL with kinetic captions inside the layered design canvas, see the Design Studio guide — they complement each other.