Recently, Meta unveiled a project named NotebookLlama, designed to emulate the popular podcast generation feature found in Google's NotebookLM. NotebookLlama primarily relies on Meta's proprietary Llama model for the majority of its processing tasks.
In a manner akin to NotebookLM, NotebookLlama is capable of producing text summaries in a conversational podcast style, based on text files uploaded by users. The process involves initially generating a transcription from the file, such as converting PDF news articles or blog posts into text. Subsequently, it incorporates additional dramatic elements and dialogue interruptions into the transcription before inputting the modified text into an open-source text-to-speech model.
Nonetheless, NotebookLlama’s output does not match the quality of NotebookLM. Reports indicate that the speech generated by NotebookLlama exhibits pronounced robotic characteristics, and instances of overlapping voices occur at certain points.
Meta researchers overseeing the project have indicated that the issues with voice quality can be mitigated by employing more advanced models. They stated on NotebookLlama's GitHub page: "The text-to-speech model is the primary constraint on the naturalness of the voice. Furthermore, another approach to crafting podcasts involves having two agents debate topics of interest and drafting the podcast outline. Currently, we are only utilizing one model to draft the podcast outline."
NotebookLlama is not the first endeavor to duplicate NotebookLM's podcast functionality, though the success rates of other projects vary. It's noteworthy that, to date, no project—including NotebookLM itself—has fully resolved the pervasive "hallucination" issue in AI-generated content, where AI-generated podcast material inevitably contains some fabricated information.