Modern book apps make switching between reading and listening feel obvious. You read on the train, tap a button, and your car speakers pick up exactly where youModern book apps make switching between reading and listening feel obvious. You read on the train, tap a button, and your car speakers pick up exactly where you

Seamless Shift: How Reading Turned Into Listening

2025/12/24 18:23
7분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Modern book apps make switching between reading and listening feel obvious. You read on the train, tap a button, and your car speakers pick up exactly where you left off.

When I worked in the industry in 2015, it didn’t exist.

That year, the MyBook team built one of the first working systems to sync book text with professional studio narration. What started as a hackathon experiment became an industry standard that now extends into education and media.

The Gap Between Text and Audio

In 2014, ebooks and audiobooks were disconnected products. Different apps, different catalogs, different licensing deals. Publishers and users treated them as entirely separate.

MyBook had launched in 2012 as a subscription service for digital books, combining a reading app with social features where users could share what they were reading. By 2014, they’d added an audiobook section as a standalone platform within the app. The audio service was technically simpler to build. Audiobooks have fewer metadata fields and cleaner file structures. Digital books require more complex formatting and metadata requirements.

Testing showed demand for audiobooks. The subscriptions grew steadily as users discovered the audio catalog. Then came unexpected feedback: let us switch between formats without losing our place.

Read during morning commute when you can concentrate. Listen while driving when you can’t look at a screen. Switch back to reading in a cafe where you want to highlight passages. The formats didn’t talk to each other. Starting the audiobook meant hunting through chapters to find where you’d stopped reading, often settling for “approximately the right chapter” and losing several minutes of content.

At that time, Amazon, the industry leader, had Whispersync for select Kindle titles, but it worked only within their ecosystem and covered limited books. Most platforms treated digital books and audiobooks as fundamentally different products with no expectation they should connect.

Professional narrators don’t read mechanically. They pause for dramatic effect. They vary pacing between action scenes and descriptive passages. They emphasize different words than a text-to-speech engine. A 300-page book becomes a 12-hour recording split across multiple files. Automatically mapping every sentence to its exact timestamp seemed impossible at scale.

Building the Prototype

During a MyBook hackathon where teams could experiment with risky ideas, two engineers decided to tackle text-to-audio synchronization.

They found an academic paper describing a theoretical approach to audio-text matching. The paper outlined a mathematical concept for creating an audio “fingerprint” for each text segment, then finding where it appears in a professional recording.

The system had three steps.

First, split the book text into processable segments. Run each through a text-to-speech engine to generate synthetic audio. This isn’t the audio users hear. It’s a reference version for comparison purposes only.

Second, build waveform graphs for each synthetic segment. These graphs plot audio energy over time with peaks and valleys representing volume and frequency patterns. Each sentence creates a unique audio signature, like a fingerprint that can be identified in the professional recording.

Third, compare these synthetic patterns against the actual audiobook recording. Pattern-matching algorithms search for similar shapes. When the synthetic waveform for a paragraph closely matches a section at timestamp 14:23 in the professional recording, you’ve found your sync point.

“We literally aligned the graph of the text with the audio file to let the reader move between formats,” one of the engineers explains. “At the time, it looked crazy; today, it’s just expected.”

Production implementation brought additional challenges. Audiobooks typically split into chapters or parts. Sometimes dozens of separate files for a long book. The system needed to work in stages: first identify which audio file contains the text segment, then pinpoint the exact timestamp within that file.

They added machine learning models to improve accuracy. The initial pattern matching worked for standard cases, but ML helped handle variations. Some narrators pause significantly longer between sentences. Some read much faster or slower than TTS engines predict. Some add vocal effects or character voices that throw off basic matching. The ML layer learned to recognize and account for these narrator-specific patterns.

The hackathon prototype proved the concept worked. The team spent several months rebuilding it for production, processing MyBook’s entire catalog of tens of thousands of books.

User Response and Market Impact

MyBook launched the feature and users adopted it immediately. Usage patterns showed something interesting: people don’t mentally separate content into “text” vs “audio.” They think about the story they’re consuming, and format is just a constraint of their current environment. Reading on the subway and listening in the car aren’t different activities. They’re continuing the same book in different environments.

MyBook had an advantage in coverage. Amazon’s Whispersync worked only for titles available in both Kindle and Audible formats, and only within Amazon’s closed ecosystem. MyBook processed their own entire catalog automatically. If both formats existed on the platform, syncing worked without special setup.

The business model shifted. Users who subscribed for ebooks suddenly had access to audio at no additional cost. A book purchased in one format became effectively available in both. Competing services that didn’t offer format flexibility started feeling limited by comparison.

Within two years, other platforms adopted similar technical approaches. The feature went from experimental to expected. Book apps that couldn’t switch formats felt outdated.

Beyond Books

The same technical approach spread to other content types, solving a consistent problem: people consume information in different contexts with different constraints.

EdTech platforms adopted it for lecture materials. Students take notes during class, then listen to recordings while commuting. Automatic syncing between written notes and audio timeline means tapping a note jumps to that exact moment in the recording.

Podcast apps added transcripts with timestamp syncing. Read an interview at your desk when you can’t wear headphones, then switch to audio at the gym. The app starts playing where you stopped reading.

News organizations tried hybrid formats. Video platforms added it for educational content. The pattern repeats: people need visual focus for complex information sometimes, but their hands are busy other times. They’re in noisy environments where audio doesn’t work, or they want to skim quickly, which only text supports.

Before format syncing, switching contexts meant struggling with the wrong format or abandoning content. Commuters who started reading at home often didn’t finish books because continuing in audio during their drive required too much friction.

From Innovation to Infrastructure

Ten years later, format switching is standard in book apps. Behind the simple button sits complex technical work: pattern matching algorithms, multi-stage file search, ML models adjusting for narrator variations.

The evolution followed a familiar path. An academic paper described what might be possible. MyBook turned that possibility into working software. Other platforms saw it work and built their own versions. Each iteration refined the approach until format flexibility stopped being a feature and became basic infrastructure.

Users didn’t need convincing that switching between reading and listening was valuable. They already wanted it. What they needed was for someone to solve the hard technical problems that made it possible.

Fedor Nasyrov

Development Team Lead at Exness

Fedor has over 10+ years of experience in software development and product engineering. Prior to Exness, he led development teams at MyBook, where he worked on subscription products, international expansion across EU markets (including GDPR compliance and local payments), and contributed to building and scaling a large digital reading platform. His technical background includes Python, Django, PostgreSQL, and JavaScript.

Comments
시장 기회
TAP Protocol 로고
TAP Protocol 가격(TAP)
$0.4935
$0.4935$0.4935
-7.96%
USD
TAP Protocol (TAP) 실시간 가격 차트
면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.