polyclaw v5.0.0

Media Handling

Polyclaw supports sending and receiving files, images, audio, and video through its messaging channels.

Media Classification

The media/classify.py module maintains a MIME type registry that classifies files into categories:

CategoryExamples
ImageJPEG, PNG, GIF, WebP, SVG, BMP
AudioMP3, WAV, OGG, FLAC, M4A, AAC
VideoMP4, WebM, MOV
FilePDF, DOCX, XLSX, ZIP, etc.

Directory Structure

~/.polyclaw/media/
  incoming/     # Downloaded from channels
  outgoing/     # Generated by the agent
    pending/   # Awaiting delivery
    sent/      # Successfully delivered
    error/     # Failed delivery

Incoming Media

When a user sends a file through a messaging channel:

  1. The Bot Framework SDK provides the attachment metadata
  2. incoming.py downloads the file from the Bot Framework CDN
  3. The file is saved to media/incoming/ with its original filename
  4. A media-aware prompt is built that describes the file to the agent
  5. For images, the agent can analyze the visual content
  6. For documents, the content is extracted when possible

Outgoing Media

Outgoing media is handled by two complementary mechanisms.

Inline response attachments (incoming.py::extract_outgoing_attachments): file paths referenced in the agent response text are detected via regex, read from disk, and base64-encoded as inline Attachment objects sent directly with the reply.

Pending directory pipeline (outgoing.py::collect_pending_outgoing): files written to media/pending/ by agent tools or skills are collected and attached on the next message delivery. The pipeline enforces a 190 KB per-file limit. Images that exceed this limit are automatically downscaled using Pillow (up to six progressive attempts at 75% scale each). Files that cannot be reduced to fit are moved to media/error/ with a .error.txt sidecar explaining the reason. Successfully sent files are moved to media/sent/.

Both mechanisms are invoked together in message_processor.py on every agent response.

Media in Web Chat

The web dashboard serves media files via the /api/media/{filename} endpoint. Incoming and outgoing files are accessible for viewing and downloading through the chat interface.

Supported Operations

OperationDescription
Receive imagesView and analyze images sent by users
Receive documentsProcess uploaded documents
Send files (inline)Attach response-referenced files directly to replies
Send files (pending)Attach pre-generated files from media/pending/
Image auto-resizeDownscale oversized images before sending (requires Pillow)
Image analysisDescribe image contents (via LLM vision)