Pollen is seeking Media Specialists
Job Type: Contract/freelance
Hours: Part‑time / Flexible (up to 30 hrs/week; per‑task time varies by complexity)
Rate: £22 per hour
Location: UK / Remote work
Role summary: We’re hiring detail‑oriented audio/video annotators to produce high‑quality, descriptive annotations for audiovisual assets. Annotations focus on four distinct layers: Dialogue (Dialogue, Vocal Bursts/Paralinguistics), Music (genre, instrumentation, texture), Sound Effects (environmental/foreground fx), and Foley / Vocal non‑speech effects.
We are also recruiting for annotators to work on purely image based descriptions, which will focus on the following layers:
Subject/Content - People/characters, objects, actions, environment, contexts, emotion & tone.
Camera & Framing - Shot types, framing, perspective, camera movement, stability, depth of field.
Lighting - Natural vs artificial, time of day, brightness, contrast, lighting style.
Colour - Colour palette, saturation level, colour grading style.
Image Quality/Medium - Resolution (high vs low), capture type, Production quality, live action vs animated
Editing & Structure - Single take vs edited, number of cuts, pacing, transitions
Motion & Dynamics - movement within frame, density of motion (busy or sparse), directionality, rhythm
Effects & Post-processing - VFX, motion graphics, discernible green screen, practical vs digital effects
Tone/Intent - Emotional tone, production intent, audience feel, genre cues.
This does not include verbatim transcription — the work is purely descriptive labeling and contextual notes to support downstream audio/video workflows.
Key responsibilities:
Review video/audio clips and create descriptive annotations across all relevant layers: Dialogue, Music, SFX, Foley, and Visual layers (Subject, Camera, Lighting, Color, etc.).
Apply industry-standard vocabulary for both Audio and Visual layers, utilising relevant film/video terminology (e.g., shot types, high key lighting, depth of field, hard cuts, color grading).
For Non‑Speech Audio: identify SFX vs vocalized SFX vs Foley; describe source, timing, intensity, and design.
For Background Ambience and Music: describe dominant sources, continuity, genre, tempo, primary instrumentation/vocals, and function (e.g., underscore vs diegetic).
For Music: describe genre, tempo, primary instrumentation/vocals, structural cues (builds, drops), function (underscore vs diegetic), and any identifiable track info when present.
Judge and apply the right level of detail. Avoid irrelevant rabbit holes; prioritize actionable search terms and library‑style descriptors.
Leave annotation layers blank when outside candidate specialty (we accept annotators who cover 2+ layers, across both Audio and Visual categories).
Follow quality, accuracy, and pacing standards. Examples of good vs poor annotations will be provided.
Preferred qualifications:
Familiarity with film, game, or video AV workflows (audio editing, sound design, visual analysis, color grading, etc).
Familiarity with music, sound effect, and image/video libraries, including common key words and search terms.
Ability to read basic screenplay/scene cues and think in narrative terms.
Strong English writing skills (U.K. English preferred) and attention to descriptive detail.
Experience with data labeling/annotation, audio editing, dialogue editing, or video post-production is a plus.
Comfortable working independently and handling variable task complexity.
Training & equipment:
Training provided via video materials and local point‑people in each region to manage onboarding and quality.
Required: computer, reliable internet, headphones. No specialized hardware required.
Hiring notes & constraints:
Preference for candidates comfortable with all four layers, but can hire those who are familiar with at least 2 layers.
Deliverables:
Descriptive annotations submitted in the provided platform/template per asset.
Adherence to quality checks and example annotation standards.
Disclaimer: the role involves human annotation for AI and ML training purposes. The scope of this project does not involve creating synthetic voices intended to substitute for working AV professionals (e.g., voice cloning). We absolutely recognise the wider concerns within the AV community around AI, and we aim to approach this area thoughtfully and transparently.
How to Apply:
If you fit the description above, we would love to hear from you! Please fill out this short form: https://bit.ly/4vd9QAC (or click the button below). As part of this application, you may be asked to complete a selection of tasks that mirror the daily responsibilities of an Annotator. You can opt into any or all of these tasks. Each successful completion qualifies you for work in that specific field. Completing all three unlocks the maximum amount of available work.
Please note that these tasks are used strictly for candidate evaluation and will not be used for any other purpose. We are basing our hiring decisions on your performance in this test and successful candidates will be selected until we are fully staffed.