Parseq tutorial 3: audio synchronisation with Stable Diffusion and Deforum

Robin Fernandes
8 May 202326:07

TLDRThis tutorial delves into the intricacies of audio synchronization for creating an underwater animation scene using Parseq. It covers the synchronization of visual elements like zoom, scene changes, and 3D rotations with the beat and pitch of the music, utilizing individual audio stems for precise timing. The guide also discusses techniques to enhance visual continuity and novelty, such as adjusting the strength dips and using cadence frames for smooth transitions, ultimately resulting in a dynamic and immersive visual experience.

Takeaways

  • 🎵 The tutorial focuses on audio synchronization techniques for creating an underwater scene animation.
  • 🎶 The animation synchronizes with different audio elements such as bass drums, snare drums, and string instruments.
  • 📊 The video's zoom and scene changes are timed with the bass drum beats, while snare drum beats trigger scene changes and vertical vibrations.
  • 🎹 The creator used Renoise to compose the backing music and render individual audio files (stems) for precise synchronization.
  • 🚀 The tutorial assumes familiarity with Parseq and its basic concepts, and guides through setting up the necessary tools and software.
  • 🎞️ The process involves reducing the number of fields managed by Parseq to simplify processing and speed up the animation.
  • 🖌️ Prompts are used to guide the content of the animation, with both positive and negative prompts to include or exclude specific elements.
  • 🔄 The tutorial demonstrates how to use event detection to automatically identify and label keyframes based on audio events.
  • 📉 The strength of the frames is manipulated to introduce novelty during specific dips in the audio, such as snare drum beats.
  • 🛤️ A bezier curve is used to create a smooth zoom effect that starts and ends at specific keyframes labeled with 'Bassdrum'.
  • 🌀 3D rotation is added to give the illusion of the camera spinning around the subject, with the speed of rotation guided by the pitch of the strings.

Q & A

  • What is the main focus of the third Parseq tutorial?

    -The main focus of the third Parseq tutorial is audio synchronization in creating an underwater scene animation that progresses from empty and quiet to filled with various sea creatures and ends with deformed Cthulhu-like beings.

  • How is the zoom or translation along the z-axis synchronized in the animation?

    -The zoom or translation along the z-axis is synchronized to the bass drum, creating a sudden push forward or zoom effect that tapers off slowly after the drum kick.

  • What role do snare drums play in the scene changes of the animation?

    -Snare drums trigger scene changes which are also associated with a strength dip, introducing more novelty and a change in the animation when they occur.

  • How does the tutorial creator manage to synchronize audio and visual elements so effectively?

    -The creator composed the backing music and used a tool called Renoise to render stems, which are individual audio files that isolate specific instruments like bass drums, snares, and strings. This allows for clean synchronization against those individual parts.

  • What is the significance of the BPM adjustment in the tutorial?

    -The BPM (beats per minute) is adjusted to match the song's rhythm. The creator composed the music at 180 BPM, but because the beat ended up in half time, it felt more like 90 BPM. The correct BPM setting helps in aligning the visual elements with the audio for a seamless experience.

  • How does the tutorial creator handle the issue of artifacts in the animation?

    -To handle artifacts, the creator dips the strength for 1 frame every 8 beats independently of any sound before the snare comes in. This prevents overcooking and reduces the chances of artifacts appearing due to continuous high strength over many frames.

  • What is the role of the strings in the 3D rotation effect?

    -The pitch of the strings is used to guide the speed of the 3D rotation, giving the illusion that the camera is spinning slightly around the subject. The rotation speed is synced to the pitch, creating a dynamic visual effect that changes with the music.

  • How does the tutorial creator ensure that the zoom effect has the same energy on every kick?

    -The creator uses a fixed duration for the bezier curve with the 'in' parameter, ensuring that the zoom effect or energy remains consistent on every kick regardless of the time between kicks.

  • What is the purpose of the prompt weight adjustments in the animation?

    -Prompt weight adjustments are used to control the influence of certain prompts in the generated animation. For example, increasing the prompt weight for tentacles ensures that hints of tentacles appear earlier in the animation, while adjusting it based on the pitch time series can introduce more varied and intense elements as the music progresses.

  • How does the creator achieve the final result of the underwater scene animation?

    -The creator achieves the final result by carefully synchronizing audio elements like bass drums, snares, and string pitches with visual elements such as scene changes, zoom effects, and 3D rotations. The creator also fine-tunes various parameters like strength dips, prompt weights, and time series to create a cohesive and engaging animation that matches the rhythm and mood of the music.

  • What advice does the tutorial creator give for generating variations of the video?

    -The creator suggests that by slightly modifying the prompts or using different seeds and samplers, one can generate essentially unlimited variations of the video. This allows for a lot of creative flexibility and experimentation with the animation.

Outlines

00:00

🎬 Introduction to Audio Synchronization in Parseq

The video begins with an introduction to the third Parseq tutorial, focusing on audio synchronization. The creator expresses awareness of the audience's interest in this topic and describes the underwater scene they will create, which starts empty and gradually fills with various sea creatures. The video explains how the zoom along the z-axis is synchronized with the bass drum, and scene changes are triggered by the snare drum and pitch of the strings. The creator also discusses the use of stems for clean synchronization and reassures viewers that stems are not strictly necessary for the techniques covered. The video then moves on to the setup process, including the Parseq UI and the selection of fields to control with Parseq.

05:01

🎵 Fine-Tuning Audio Events and Keyframes

In this paragraph, the creator delves into the process of fine-tuning audio events and keyframes. They discuss the detection of snare drum events and the use of silence threshold to filter out quiet events. The creator demonstrates how to manually add or remove events and generate keyframes labeled 'Snaredrum'. The paragraph also covers the process of locking keyframes onto beat positions and adjusting the strength dips on snare keyframes using the info_match_last() function. The creator shares their preference for dipping strength for exactly one frame and provides a walkthrough of the Deforum settings used in the video.

10:03

📏 Adjusting Z-Translation and Bezier Curves

The creator moves on to discuss the Z translation effect triggered by the bass drum and the use of Bezier curves to achieve a desired animation effect. They explain the process of detecting bass drum events, generating keyframes labeled 'Bassdrum', and the importance of merging info fields for both bass drum and snare drum keyframes. The paragraph details the creation of a Bezier curve between two keyframes and the use of cubic-bezier.com to adjust the curve's parameters. The creator also addresses issues related to the speed of the zoom and the resetting of the zoom on each beat, offering solutions to maintain consistent energy and continuous forward movement in the animation.

15:08

🎶 Syncing 3D Rotation with String Pitch

This paragraph focuses on adding 3D rotation to the animation, aiming to create the illusion of the camera spinning around the subject. The creator describes the process of extracting pitch from the string audio and filtering out unwanted spikes. They discuss the use of normalization, clamping max values, and creating a time series for the y rotation. The paragraph also covers the addition of x-axis panning to complement the rotation and the need for a multiplier on both rotation and panning parameters. The creator demonstrates how the pitch-driven changes are applied to the y axis and how the x axis panning is adjusted to compensate for the rotation.

20:09

🌀 Refining the Animation and Addressing Artifacts

The creator addresses the refinement of the animation and the mitigation of artifacts. They discuss the high strength at the start of the video and the strategy to dip the strength every 8 beats before the snare comes in. The paragraph details the changes made to the Y rotation and X translation interpolation for the end part of the video, switching to a sine wave and using the strings to trigger scene changes. The creator also talks about enhancing the prompt by adding a linear prompt weight on tentacles and adjusting the prompt based on the pitch time series. The final result of the video is showcased, and the creator discusses the impact of increasing the cadence on short strength dips.

25:16

🚀 Conclusion and Encouragement for Variation

In the concluding paragraph, the creator expresses excitement to see the audience's creations and encourages viewers to share their work. They mention that future tutorials will likely include shorter examples and different functions, and invite suggestions for specific topics. The creator also highlights the ease of generating variations of the video by slightly modifying the prompt and encourages viewers to experiment with different seeds and samplers.

Mindmap

Keywords

💡audio synchronization

Audio synchronization refers to the process of aligning visual elements with audio cues in a video or animation. In the context of the video, it is crucial for creating a seamless and immersive experience where the changes in the scenes are timed with the beats and的旋律 of the music, enhancing the storytelling and emotional impact.

💡Parseq

Parseq is a tool used for advanced control and manipulation of AI-generated images and animations. It allows users to fine-tune various parameters and settings to achieve the desired outcome. In the video, the creator uses Parseq to synchronize the animation with the music, control the generation of frames, and manage keyframes for different audio events.

💡Renoise

Renoise is a digital audio workstation (DAW) used for composing and editing music. The video creator used Renoise to compose the backing music for the animation, which allowed them to render stems – individual audio files for specific instruments. This capability is essential for achieving precise synchronization between the audio and visual elements in the animation.

💡keyframes

Keyframes are specific points in an animation or video timeline where critical changes occur, defining the start and end of animations or transitions. In the video, keyframes are used to mark the moments when certain audio events, such as the bass drum or snare drum hits, occur, allowing the animation to change scenes or adjust parameters in response to the music.

💡3D rotation

3D rotation refers to the process of spinning or turning an object or camera around one or more of its axes in three-dimensional space. In the video, 3D rotation is used to create a sense of movement and dynamism, simulating the camera spinning around the subject in response to the pitch of the strings from the music.

💡pitch

In music, pitch refers to the perceived frequency of a sound, which determines how high or low a note sounds. In the video, the pitch of the strings is extracted and used to control the speed and direction of the 3D rotation, creating a visual representation of the music's melody and adding a dynamic element to the animation.

💡time series

A time series is a sequence of data points collected or recorded at regular time intervals. In the context of the video, time series are derived from the audio stems to control various visual elements such as rotation and scene changes, allowing for a direct visual interpretation of the audio.

💡prompts

Prompts are inputs or text descriptions provided to AI systems to guide the generation of specific outputs. In the video, prompts are used to define the visual elements and themes of the animation, such as an underwater scene filled with squid and krakens, which are then combined with the audio synchronization to create a cohesive narrative.

💡cadence

Cadence, in the context of video production and AI generation, refers to the frequency at which the AI generates new frames. Adjusting the cadence can affect the smoothness of the animation and the impact of scene changes or other effects. In the video, the creator discusses how changing the cadence can influence the visibility of strength dips and the overall quality of the animation.

💡Zoe depth

Zoe depth is a parameter in AI-generated animations that controls the level of detail and complexity in the generated frames. Higher Zoe depths typically result in more intricate and detailed images. In the video, the creator mentions using Zoe depth to achieve further rotations, indicating its importance in enhancing the visual quality and depth of the animation.

💡Midas weight

Midas weight is a parameter in AI-generated animations that influences the impact of certain visual elements or transformations. In the video, the Midas weight is used to control the extent of the 3D rotations, with the creator finding that a weight of 0.4 works well for their intended effect.

💡bezier curve

A bezier curve is a mathematical curve used in computer graphics to create smooth, continuous transitions between two points. In the video, the creator uses bezier curves to control the zoom effect in response to the bass drum beats, allowing for a natural and dynamic change in the camera's distance from the subject.

Highlights

Introduction to audio synchronization in the third Parseq tutorial.

Creating an underwater scene animation with audio-driven visual elements.

Syncing zoom along the z-axis to the bass drum and scene changes to the snare drum.

Utilizing 3D rotation guided by the pitch of the strings for an immersive audio-visual experience.

The use of Renoise to render stems for cleaner audio synchronization.

Explanation of Parseq's client-side audio processing and its privacy benefits.

Reducing the number of fields managed by Parseq to simplify processing and speed up the system.

Incorporating prompts and negative prompts to shape the generated animation.

Setting up FPS and BPM to match the animation with the composed music.

Dipping the strength when the snares come in for scene changes and novelty.

Using event detection to automatically identify snare hits and generate keyframes.

Adjusting the bezier curve for the Z translation to match the desired visual effect of the bass drum.

Synchronizing the speed of 3D rotation to the pitch of the strings for a dynamic camera effect.

Normalizing and filtering the pitch data for a more accurate representation of the music's influence on the animation.

Combining Y rotation and X translation to create the illusion of the camera spinning around the subject.

Adjusting the animation's start and end with techniques like strength dips and changing interpolation methods.

Enhancing the prompts and using pitch time series to introduce thematic variations like 'horrifying, disfigured faces' and 'shipwreck, ruin'.

The potential for unlimited variations and the ease of generating completely different scenes by modifying the prompts.