What would it take to write one million words?

I remember when I was participating in the National Novel Writing Month challenge (~NaNoWriMo)—being able to write close to 500 words in 15 minutes. And these were not too bad. (granted, I was writing in French, which is much easier for me)

It's important I start writing fast again, because I have an idea for a project which requires a large dataset, and I don't want it to be any random kind of dataset. I want to use words which capture my voice.

It's been a few days I've been obsessed with the idea of better understanding what makes a person's voice. Our lives are ephemeral. Once we die, our voice disappears. The only voices that survive—the ones that live past a person's expiration date—are the ones of famous authors and artists whose work is transcending the centuries.

I have an idea of what Molière or Cicero sounded like. Obviously, nobody sounds the same way when they write compared to when they speak; that's why I only have an idea and not a perfect representation of Molière's and Cicero's voice. But the fact that I have access to their work, que leur travail leur ait survécu, this gives me a glimpse into what they sounded like.

The main question I'm struggling with right now is: is my voice in English the same as my voice in French? And the answer is, obviously: no. I don't have the same voice in French and in English. I'm not even sure someone who is truly bilingual has the same voice in two different languages. (by voice, I don't mean tone or pitch, but truly, the personal voice of a human being, their unique signature which I don't think has been truly captured and analysed in any form so far)

The project I have in mind consists in analysing my personal notes, and extracting interesting elements out of it. At first, I'll keep it practical: generating ideas to write about based on the themes I already wrote about. This is why I started learning Python today. From the little I have seen, there are many libraries making it quite easy to extract such meaning from many content formats. However, long term—by the way, notice how I wrote "however" here, while I would have used "but" if I was having this conversation face-to-face with someone—so, long term, my idea is to extract my own voice from these notes. I'm honestly not even sure what that "voice" is. I'll probably need to read some more research. Maybe ask smart people who have spent much longer thinking about this.

Like most people, I don't want to die. And I don't want to lose the people I love. Of course, capturing one's voice—truly capturing it—opens the door to many Black Mirror scenarios with zombie-like robots used as placebo companions for grieving people. But that's not really what I have in mind. I'm specifically interested in the written medium. As much as I would want to capture my mom's voice, what I have in mind is not so much—again—about one's tone or pitch. It's about their personal voice: the way they think and the way they express themselves.

This experiment would only work with people who tend to think out loud. And writers are exactly the kind of people who tend to think out loud. Better yet, they think out loud in a way that makes it easy to perform data analysis.

To narrow it down even further, it would only work with extremely prolific writers. To capture a person's voice, you would probably need a mix of their conversations, notes, and essays. (I'm not sure how well these various formats could be cleaned up to work together, but I assume that's something I'll learn soon while studying Python)

I'm a pretty prolific writer. I write about three articles a week on Ness Labs. The problem is these articles don't capture my voice. They're factual research articles that don't do a great job at capturing the intricacies of all the weird ways my mind sometimes wanders. Going back to the language issue, not only doing this work in English would mean I'd only capture part of my voice (more accurately, one of my voices), but if the English source material came from my professional blog, it would miss huge chunks of how I actually think and express myself.

Part of me would like to honour my French heritage and do this exercise in French. But I have lost the habit of typing in French, even of thinking in French, and it would take me forever to produce the same amount of content. My muscle memory is unfortunately gone in my French-thinking mind. (I believe it could absolutely be revived if I made it a priority by reading French books and buying a French keyboard)

So—English it's gonna be. I just want to acknowledge (I can never write this word properly and always have to use autocorrect) the fact that I'm leaving a massive part of my identity on the table. My French heritage (the language, the culture, the mental models) is not 50% of my identity, it's much, much more. Genetics aside, I'm 60% French, 30% Algerian, 10% Lost. Despite living, thinking, sometimes dreaming in English, no part of me feels British. But, for practical reasons, I can't tackle this project in French.

The idea of this project is to conduct a data analysis of some sort (I'm way too early to even have any idea of what that looks like) on my own notes and to: first, suggest topics to write about based on recurring themes, second, capture my voice. For both parts, I have no idea what the process would entail, and I'm starting this journey completely blind. The interesting part will be to use the notes I take during this project as material for the data analysis.

Hence the title of the first note in this project: what would it take to write one million words? 2740 words per day for one year. A thousand 1000-word articles. Whatever the approach, quite a bit of time and energy. Which brings me back to the ~NaNoWriMo and my average of 500 words in 15 minutes. (bear in mind that it was in French, a language that's—I think—much more verbose than English) If I was truly dedicated to this, I could write one million words in one year. That would make for a pretty solid snapshot of my voice at t = 2020.

But let's not get carried away. From a practical standpoint, I'm going to block time every day for one hour of free flow thinking. I should probably set a target number of words if I want to get to one million words, but I want to make it more about the thinking time than the number of outputted words for now. I need volume, but I need the words to sound like me and to reflect what I think, versus writing a bunch of platitudes just to hit a word count.

I'll write these free flow notes into Roam, then paste them into my public-facing digital garden. That's something I'm still a bit hesitant about. I want to be able to think without worrying about what other people may think. But it's a good exercise to kill these inner worries. On the technical side of things, I'll also be able to export a JSON or a CSV file with all these notes, which will include a timestamp and any tags I wish to add. I have absolutely no idea how this will help in the future, but from skimming through some articles, python plays nicely with JSON and CSV files.

To summarise, I need to achieve three things. (not in order) First, learn enough Python to be able to perform basic data analysis and extract recurring themes to suggest ideas of what to write about based on the content of my notes. Even though I have no idea how to get there right now, this feels like the more practical part of this whole project. Second, extract my voice from my notes. That's the more esoteric part of the project. I don't even know what my "voice" means—what variables I would need to create, what the output would look like. This is probably where I'll need to start learning about machine learning, neural networks, and all that jazz. (yes, at this point I have no idea if these make any sense in the context of this project) Third, write enough personal notes so that whatever analysis I perform is meaningful, aka statistically significant.

In terms of strategy for producing enough personal content (beyond Ness Labs), I will probably use a mix of in-the-moment inspiration and the big questions spaced repetition system, cycling through some of the important questions I care about and using them as prompts for a daily thinking practice.

One important rule I have for writing these—and that I have been using to write the present note—is to not look up anything online. When I write "proper" articles for Ness Labs, I spend a lot of time looking up research papers, or just checking for better synonyms in dictionaries, or using Google Translate to figure out how to say something. The goal of these notes is to capture how I think, so there will be no external input during any such thinking session. This may result in grammatical mistakes and factual errors, but I'm wiling to take that risk to avoid getting into research mode.

Finally, I'll also create a spreadsheet to track my progress towards one million words. I love tracking stuff in spreadsheets. It gives you a nice illusion of control. And when you lose your motivation, it does help to look back so you don't feel like breaking your streak. (side note that I should really write about my mom and my grandma at some point - if I do I will link to it from here, or automatic backlinks will take care or it)

In the spirit of this very first free flow thinking session, the reason why I thought about my mom (and I can't think about my mom without thinking about my grandma) is because I wrote about losing motivation, which is something my mom doesn't really believe in. My mom truly, deeply believes in the power of compound interest. There is no small gain. As long as you keep on pushing forward, good things will happen. She comes from an extremely poor family, and this mentality has served her well. She went from working in a cotton swabs factory to cleaning houses to waitress to receptionist to director in a private clinic. But I'm not my mom, so I need a spreadsheet.

Again, feeling a bit weird putting this on the web, but also a bit excited because this is what blogging used to be about.