PhD reflections

Engine room control panels and gauges
Taking the wheel of the PhD. (Photo generated by Gemini)

I finished my PhD in November 2024! I’m grateful to everyone who supported me along the way. What follows are lessons I learned through my own journey and patterns I’ve noticed in others. Every PhD path is different—but perhaps some of these reflections will resonate.

The single biggest lesson was learning to exercise agency. It is the feeling of being in the driver’s seat of one’s work: taking initiative, making decisions, and owning the outcomes. This might sound like the very definition of being a researcher, but I was surprised to discover how easy it is to become a passenger in one’s own PhD project.

The alternative, “passenger” pattern is a familiar one. You start the PhD program with broad, youthful interests. Advisors, seasoned and well-meaning, suggest a project. You adopt it, often eagerly, declaring it as your scientific field. There’s a warm sense of security in this; the project is sanctioned, the path laid out. Even if it doesn’t quite pan out as hoped, the effort is seen as admirable, and it counts toward graduation. But in that comfort lies a quiet risk: one can drift through years of work without ever truly owning it.

Here are a few reflections on why taking the wheel, however unsteadily, became so important to me.

Independent thinking begets divergence

One of my favorite ways to procrastinate is digging through academic family trees. If you’ve ever done this too, you’ll notice that professors rarely stick to the same research as their own PhD advisors. My PhD advisor works on machine learning for science. His advisor works on signal processing topics like wavelets and information theory. The fields are somewhat related but not the same.

In fact, I’ve come to believe this kind of divergence isn’t the exception but the norm. It’s often how researchers truly make their mark: by drifting away from their origins. PhD students are, after all, expected to become independent researchers. It stands to reason that we should begin thinking independently during the PhD itself.

When I noticed my interests drifting away from my advisor’s primary focus, I initially felt guilty. Was I unfocused? But recognizing this pattern of divergence across academic generations helped me reframe it: divergence isn’t distraction. It’s part of becoming an independent researcher. The question I learned to ask myself wasn’t “Am I staying on the path that I started with?” but rather “Am I developing my own judgment about what matters?”

Taste impacts judgment

Even if natural science prizes objectivity, researchers inevitably lean on their past training: topics they’ve seen, studied, and succeeded with. An example that sticks with me is the discussion around ReLU’s success in deep learning. Around 2020, ReLU was the activation function that everybody used, and many sought to explain why. Yoshua Bengio famously suggested that ReLU works because it resembles spiking, biological neurons. Meanwhile, signal processing gurus argued that they could re-derive ReLU based on sparse coding’s seminal ideas; therefore, they claimed, sparsity explains why ReLU works so well.

Interestingly, these arguments focused on opposite sides of the same function—one on the positive axis, the other on the negative. In hindsight, neither explanation fits. Calling an identity function “spiky” is a stretch, and many non-sparse activations (Leaky ReLU, GeLU) work just as well, if not better1. What happened here is that each camp saw what its training prepared it to see. Sometimes that lens becomes so strong it blocks other views. Max Planck made the famously grim observation: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it…”

The realization for me was that even brilliant researchers filter ideas through their past. They couldn’t easily unsee what they had already seen. This realization gave me permission to trust my own judgment, even when it diverged from the establishment.

Incentives steer unevenly

Students and professors face different incentives and timelines. To land research-related jobs in either industry or academia, PhD students often need a strong publication record. But for tenured professors, as long as a few papers get published from time to time, their careers generally stay on track. Of course, there’s no grand conspiracy of PhD advisors against PhD students; it’s simply that the incentives of academia are naturally set up to benefit those in secured positions.

This mismatch can introduce two kinds of risk. First, a project might be too ambitious, high-reward, but with a high risk of failure for a fledgling PhD student to recover from. Shing-Tung Yau, a Fields Medalist mathematician, tells a striking story in his autobiography. When Yau was a PhD student, his advisor, Shiing-Shen Chern, urged him to prove the Riemann Hypothesis. Yau declined, a decision he was later grateful for.

The second risk is that a project might be defined in a very narrow domain. This might be noble if you are pursuing knowledge for its own sake and you happen to be captivated by this domain. But I wanted my work to be useful to others, and a niche topic meant that few people would care about my work. Similar to an efficient market, the most impactful research is often not in the niche areas, as otherwise, those areas would quickly attract more researchers and cease to be niche. In the preface of his book, Stéphane Mallat compared the scientific community to a school of fish: “I cannot help but find striking resemblances between scientific communities and schools of fish. […] Some of us like to be at the center of the school, others prefer to wander around, and a few swim in multiple directions in front. To avoid dying by starvation in a progressively narrower and specialized domain, a scientific community also needs to move on.” It falls to the student to judge whether their project is leading them toward open water or a stagnant pond.

One consequence of these misaligned incentives is that students can fall into a pattern of “waiting for permission”—the opposite of agency. I’ve seen many, myself included, delay publishing simply because their advisors never explicitly urged them to submit to the next conference. But taking ownership of that timeline and pushing to submit a paper a bit harder than a tenured PhD advisor can be healthy. Success breeds success, and that momentum often starts small. An incremental idea, for example, might be too minor for a professor to prioritize, but for the student, it’s a valuable starting point and in their best interest to pursue.

My Own Experience

So, how does one take the wheel? As I see it, there are two main ways. The first is to find a project that genuinely excites you and also aligns with your advisor’s expertise, and then frame it to gain their support—a strategy Philip Guo has called ‘leading from below’.

Leading from below was not my strong suit. During my PhD, I pitched quite a few project ideas to my PhD advisor on topics like meta-learning deep image priors, graph mixture of experts, bi-stochastic attention for graph neural networks, and spherical neural operators. I was usually met with a politely lukewarm response and a gentle nudge toward working on something else. Later, I observed published work that independently proposed similar ideas, and they resulted in a mixed feeling. It suggested my ideas were indeed publishable, but it also confirmed they were somewhat incremental–the kind of unsurprising extensions many would arrive at in a short period of time. In the end, all the core work in my thesis stemmed from my advisor’s original suggestions. The work that goes into my PhD thesis, much like many others’, was ultimately more of a top-down prescription than a bottom-up initiative.

The second path, which I eventually took, is a dual-track approach: I worked on my advisor’s suggested projects as the primary focus, while setting aside some time to pursue my own research interests. This approach is imperfect, but it gives me some agency. The professor-approved projects provided a safety net for graduation and were a good way to improve research skills. Meanwhile, the side projects gave me the freedom to explore, build a unique profile, and develop work that was outside of my advisor’s immediate interest. For me, an effective way to find side projects is through open-source contributions. Many machine learning directions have communities around their open-source implementations. For language modeling, an example is Hugging Face’s extremely active Transformers repository, which has a wide range of open issues and feature requests waiting to be tackled. Reading the source code of a project that excites you and engaging in discussions with the contributors can be a great launchpad for a new project. Personally, I started a side project just like this. At the time, I wanted to work on language models and admired Costa Huang’s work reproducing OpenAI’s research when I came across his GitHub repository. I reached out with questions, which turned into a collaboration. We fully reproduced OpenAI’s results in PyTorch and JAX, sharing our work in a blog post and a benchmark paper. Through similar efforts, I also made some minor contributions to the main Transformers repository and the TRL repository.

Internships were another outlet. Thanks to my PhD advisor’s support, I interned at Google twice. I owe a great deal to his support, especially since internships during a PhD were uncommon at my university, and to some extent in European PhD programs in general. He recognized these experiences as beneficial for my long-term growth, even at the expense of short-term research productivity. Proactively seeking out internships during my PhD turned out to be a truly worthwhile decision for me. There were challenges like preparing for technical interviews, getting work visas, and finishing projects within tight time horizons. However, those internships provided invaluable opportunities to make connections and contribute to publications beyond the scope of my PhD.

Final Thought

The great promise of a PhD is intellectual freedom, but I learned its most common pitfall is surrendering one’s agency. Agency requires resourcefulness, creativity, and sometimes quiet defiance. It’s tempting to follow a set path that leads to a diploma, but for me, the real reward came from taking the wheel where I could—pursuing my own creative initiatives rather than letting the journey simply happen to me. To be honest, though, these realizations didn’t transform my thesis. That still followed my advisor’s direction; I didn’t get to shape it the way I wanted. But they gave me the confidence to invest in side projects and internships, which ultimately shaped my career trajectory more than my thesis work alone ever could.

While I was editing this blog, Yang Chen-Ning, a Nobel laureate in Physics, passed away. In one of his speeches, he gave the following advice to PhD students (the original speech is in Mandarin; the translation is mine with help from Gemini):

To my fellow young students, I have the following suggestions for you. Let’s say you have just enrolled in a PhD program… I give the following suggestion to all my PhD students. Once you enroll in a top PhD program, you’ll find that most of your fellow PhD students are excellent; otherwise, they wouldn’t have been able to get into this program. But after 10 or 20 years, you’ll find that their scientific contributions differ greatly. Some achieve very good results; others put in a great deal of effort, but are not successful. The single most important reason for this isn’t that one person is that much smarter than another, nor is it because one person works that much harder than another—these factors certainly matter a bit, but they are not the most important. The most important factor is that some people entered fields with potential for growth, while others entered fields with limited potential for growth, or even fields that were at the end of their rope. This, I believe, is the most crucial point for every graduate student regarding their future. That is: you must enter a field that has a future, one that has room to develop.

Yang’s point resonates deeply, as I believe proactively choosing one’s research topic is the ultimate expression of research agency. One might argue that a PhD program is about commitment to a topic, but I (and many I know) simply lacked the maturity to make that call just a year or two out of college. I’d argue that agency—whether in choosing a topic/field, selecting problems, or finding collaborators—is a lifelong pursuit. The PhD is often just the first time one has the maturity and determination to truly exercise it.

My “dual-track” approach was a practical, imperfect solution. My thesis work still followed a prescribed, top-down path, but my side projects and internships were my outlet for active exploration and choice. To be clear, these side projects were not the priority; at least 80% of my effort remained on my thesis, with side projects reserved for evenings and weekends. However, ultimately, agency was the biggest lesson from my PhD: I feel a greater sense of fulfillment when I take the wheel, even in small ways.


The blog above draws ideas from the following:

  1. Signal processors might concede, “sure, it’s not ‘sparsity’ sparsity, but GELU results in compressible signals—their sorted coordinates decay rapidly to zero, a generalized notion of sparsity.” But if you press them on Leaky ReLU, which can result in non-compressible output, the argument shifts again. “Oh, that’s different—it results in low-rank matrices, and that’s just sparsity in the nuclear norm sense.” So you ask, “Then what about SELU, CELU, SwiGLU, ELU, or even Sine?” And then… You see, it’s a slippery slope. But to be honest, I don’t know what comes next, because I’ve never pressed that hard; I always wanted to be nice. 

 Date: September 24, 2025
 Tags: 

Previous
⏪ Reading 'Just for Fun: The Story of an Accidental Revolutionary'