\ \ Animation is too hard.
To make a 30-second clip, you usually need to learn Blender, hire voice actors, frame keyframes, and render for twelve hours. I wanted to make memes, shitposts, and storytelling clips, and I wanted them done in the time it takes to drink a coffee.
So, I built the Stickverse.
\ It is a scrappy, Python-based animation pipeline that turns a raw text story into a fully voiced, lip-synced (well, "lip-flapped") video. It uses Google's Gemini to parse the script, a CLI tool to record your own bad voice acting, and OpenCV to draw the frames.
Here is how I built it, and how you can use it to create your own episodes of the Stickverse.
The pipeline consists of three distinct Python scripts:
parser.py): Uses Gemini to turn a text block into a storyboard (JSON).director.py): A CLI tool that prompts you to record lines one by one.animator.py): The engine that stitches audio and vector graphics into an MP4.Let's break down the code.
The hardest part of procedural animation is structure. If I write a story, I don't want to manually tag every single line with { "character": "Steve" }.
I delegated this to the Gemini API. I feed it raw text, and via a system prompt, it returns clean JSON.
Here is the secret sauce in parser.py. The system prompt enforces strict rules so the animator doesn't crash later:
system_prompt = """ You are a Screenplay parsing engine. Convert the following raw text story into a structured JSON animation script. RULES: 1. Identify every character speaking. 2. Identify scene changes (backgrounds). 3. Output JSON ONLY. No markdown, no explanations. JSON STRUCTURE: [ { "type": "background", "description": "desert" }, { "type": "speak", "character": "Walter", "text": "Jesse, we have to cook." } ] """
If I feed it a text file where I wrote "Walter yells at Jesse about API keys," Gemini breaks it down into the exact scenes and lines I need to record.
Once we have a script_plan.json, we need audio.
I didn't want to use AI Text-to-Speech (TTS) because AI voices have no soul. They can't capture the panic of a man realizing he missed a deployment deadline.
I wrote director.py using the sounddevice library. It reads the JSON, clears the terminal, and acts like a teleprompter.
# snippet from director.py print(f"🗣️ CHARACTER: {char.upper()}") print(f"📝 LINE: \"{text}\"") # It waits 15 seconds for you to get into character # Then records automatically when you hit Ctrl+C
It saves every line as a separate .wav file (line_0_Walter.wav) and updates the JSON to link the text to the audio.
This is where the "Stickverse" comes alive. I didn't use a game engine. I used OpenCV (to draw lines) and MoviePy (to stitch frames).
I couldn't be bothered to train a neural network for lip-syncing. Instead, I used scipy.io.wavfile to analyze the volume of the audio file.
# snippet from animator.py def draw_frame(t, character_speaking, mouth_open_amount): # ... drawing the body ... gap = int(mouth_open_amount * 20) # Calculate jaw drop # Draw top of head moving up and down top_center = (x, 300 - gap) cv2.ellipse(img, top_center, (50, 50), 0, 180, 360, color, -1)
The result is a flappy-headed aesthetic that looks like South Park meets a terminal window.
https://youtu.be/vwO11W2zwRw?embedable=true
To test the system, I fed it a script about Walter and Jesse arguing about Hackernoon editorial standards.
The Prompt (that I totally followed to the letter):
The Result:
The code is open source. It is messy, it is funny, and it works.
The beauty of the Stickverse is that it is strictly code-based.
cv2.rectangle to the head logic.You can find the repo here: [https://github.com/damianwgriggs/The-Stickverse]()
Clone it, modify the draw_frame function, and create your own episode. We have to cook.
:::info \ Genesis Art Engine Prompt for featured photo: A blazing hot desert with a radiating sun. Meant to fit the “Breaking Stories” theme.
\ :::
\

