value function learning
2025 retrospective, part 1
this has been a year of defied expectations. actually, they weren’t just defied. they were turned on their heads and then run over by a couple of cars.
going into this year, i thought that by now i would’ve published a couple hot papers, gotten a real job (at least one that doesn’t kick me out after 3 years), maybe even gotten engaged. today i can report that i at least have retained my current job (still getting kicked out in 1.6 years) but i haven’t gotten a paper accepted all year and it’s now been 3 months (and counting) that i haven’t run a single experiment or written a single word of a new paper. by all accounts, my life has kind of (read: completely) derailed this year: i’m now 30 and single, i make less than i did at 22, and i’m not even doing my job right. for no earthly reason, though, i’m actually really happy.
i think the greatest lesson i’ve learned this year is that i can’t predict the future. this is obvious, you say, everyone knows this. but now i don’t even try to predict the future, and that’s made all the difference.
historical context: i used to be a big strategizer. most high-achieving type A immigrant kids can relate. what extracurriculars can i pack in to stand out to colleges? what internships should i do to maximize my chances of a cushy full-time job after graduation? my brain ran on a hamster wheel, always trying to figure out how to get to where i wanted to go next. i was the type of person who loved asking people what their 5 year plan looked like and lowkey judged them if they couldn’t give me a direct, well-thought-out answer. now i think 5 year plans are stupid. even 1 year plans are kind of stupid. actually, i refuse to sit here right now and surmise about what i will or hope to be like a year from now when i write another one of these, because the only thing i know for certain is that whatever my life turns out to be at that point, it’ll be so much more awesome than i could ever conjure up sitting here today. (i’m so fucking excited to have this statement validated in one year, keep me accountable.)
i think i leaned so heavily on strategizing because i was scared. scared that if i didn’t take all the “right” steps i wouldn’t land on the “right” tiles in my Game of Life, and then it’s to hell with my meticulous 5/10/20 year plan. i’ve been fortunate to have grown up getting constant reinforcement that i was smart and good at stuff and had a bright future. i never really knew what that meant though, other than that i should keep going toward the light. in school the light is obvious: take harder and harder classes and try to do well in them. then school ended and i was dropped into the real working world, where infinite paths to infinite futures lay before me.
forks in the road were hard. is being a PM a “brighter future” than being a software engineer? what about leaving my nascent but promising career in tech in san francisco behind for grad school? i vividly remember meeting up with a professor sometime during my freshman year of college and asking him how a person is supposed to know what they’re meant to do. i gave him the analogy of a specific Mario Kart stage (Koopa Cape maybe? and yes i’m serious i actually did this) with these tunnels with water running through them, and if you drive in the water you’re way faster than the people driving on the sides. i wanted him to tell me how i could find that water for my life and my career, that elusive thing that i was uniquely poised to be better at than anyone.
i should digress here that the “brighter future” talk makes it sound a lot like i was struggling under the weight of familial expectations. for those that don’t know it’s a classic story: immigrant parents place high expectations on a promising kid in hopes that giving their kid a shot in the promised land would be worth everything they left behind. i have to clearly refute this suggestion: my parents are actually chill af, at least as far as immigrant parents go. of course they want me to be successful, but over the years we’ve come to a joint understanding that success looks different for everyone and they’ve learned to trust that i know what that means for me. i could not be more grateful for them, and i would not have gotten here without their constant support and encouragement. i am the sole perpetrator and victim of this mental framework: i was so so afraid of not living up to my own expectations.
digression aside, more questions emerge: what is the geometry of this path to a bright future? how do i optimize my path locally: does brightness increase monotonically? how do i know if i am in a local minimum? what is the metric governing this space and how? can i find? the fucking geodesic?? that is what i really wanted to get out of the poor professor who may or may not have played Mario Kart before in his life. i just want the shortest, straightest path to get there, isn’t that a straightforward ask?
these days i’ve been studying a lot of reinforcement learning, which has at the very least taught me a useful piece of vocabulary: the value function. the value function simply maps each state in the system to the maximum lifetime reward for any trajectory starting at that state. the true value function for a given system is basically a treasure map: you can read off the optimal trajectory that will lead to maximum reward. fucking finally! someone has a map! as long as i can learn the true value function for life, i’m so set!
if that wasn’t immediately funny, something is wrong: no one knows the true value function for life. either it doesn’t exist or the constraints of our fleshy existence make it impossible to adopt a sufficiently exploratory policy to learn it. i haven’t fully thought out this part of the analogy, open to community feedback. regardless, before i had words for any of this i tried learning a value function lots of ways. i sought perceived oracles: my parents, professors, self-help books and podcasts authored by self-proclaimed self-made successful people. when those could only go so far i looked around at my peers. grad school is a twisted and somehow more dystopian version of the american high school trope. instead of gossiping about which girl got a new designer bag or painfully obvious boob job we gossiped about how O wrote 50 first-author papers during his PhD and got a Stanford professorship out of it, and how to get status in our collaboration to artificially boost paper and citation counts. this was great for me — this value function was not only straightforward, it was even quantifiable! just publish loads of papers and get loads of citations! unfortunately, the clearer a value function is, the easier it is to adopt and the harder it is to let go of.
vignette 1: 3 months ago, on one of those new york summer days where the blanket of humidity was finally about to break in a torrential rainstorm, W and i sat outside in chinatown eating 豆腐脑. i was tense: it was about to start raining at any moment and, way more importantly, i was 2 weeks away from a paper deadline and my results weren’t lining up. as a seasoned value function optimizer, i did not care why. i was simply pissed off, at the end of my rope, and two failed experiments away from fudging some hyperparameters to get the numbers to add up (come on, i would never fudge the numbers themselves!). at some point i said to W, “a good researcher would find these negative results interesting. this deadline is making me a shit researcher.” i would like to say that realization broke the world open for me that day, but i killed myself for the next 2 weeks getting those results and only bailed at the last second when i had to accept that shit was not gonna come together in time.
since then, i’ve almost left publishing behind. don’t get me wrong: i love sharing ideas, i value the rigor that defines a good academic paper and the ideal publication process, and i absolutely don’t mean this as a treatise to convince anyone to quit publishing if it’s working for you. but i realized that my value function and its pressure to publish was making me a shit researcher and i can just choose to be a good researcher instead. so, this is the story of how i stopped looking around for some value function to pick up off the side of the street and started looking inward instead (a tired trope, but it’s true).
vignette 2: it’s an idyllic fall morning, and i am journaling on an amtrak ride up to new haven. i like boats and trains, i think a lot better when things are moving around me. and i like journaling, it’s like talking to myself but socially acceptable plus i can read it back. at some point i read back the last line i wrote, which includes:
…feeling like my old value function (do more projects, publish more first-author papers) might be not entirely matched with my next goal (some kind of cool job in AI)…
slight tangent: i had just attended a talk by an incredible grad student, P, that week. i understood almost none of it, but one of the only things i understood was that P’s work undeniably made something new possible for his research community/subfield. maybe his subfield is 1 person (it’s not), but P changed something for someone. this led me to write:
actually, upon writing this out i feel like that’s not the real goal — i just want to be a really good researcher that can contribute something useful.
that was probably the first time i’d ever dictated my own value function, and i am in love with it!! here are some reasons why:
it is simple: do good work that helps people.
it is easy to understand how to optimize locally: work hard making progress toward something cool every day.
it is infinite-sum: i can win and my friends can all win! and the more people win the more useful stuff we make!
it can mean a lot of things: my career goal used to be the name of a job. this new goal can mean anything — maybe even the same job. but if that’s how it turns out i’ll be doing it for a reason, not because i want the name of some job.
it is not about anyone else: remember the whole dialogue about fear? the old value functions gave me so much to fear. i had to look at other people to adopt their value functions, and when i judged them by their value function i felt judged. if my value function was theirs, i had to win out by that metric to look good. but now my value function is mine and i stand alone.
it is not about me: i lived in fear of being too dumb or too old or too far behind for what i wanted to achieve. every time i struggled with a new concept or failed at a derivation i told myself those things and they told me to give up. to me, the most beautiful thing about this goal is that it’s about something bigger than me — it doesn’t matter if nothing comes easily anymore, you trudge through it until you’re out the other side and you’ll be one step closer to the goal.
caveat: this is my value function, not yours! i’m not trying to convince anyone that this is the “right” one! if you’re trying to adopt my value function, you have totally missed the point! if you get nothing else out of this, the tl;dr for this whole post is ignore other peoples’ value functions and go find your own value function!
we have a saying in chinese (thank you to A for showing me this): 山高万仞 只登一步. the mountain is 10,000 (ancient unit)s high, it all starts with one step. this sounds familiar and motivational in that familiar cheesy way, but my read is this: if you look too far ahead you will trip and fall flat on your face. i always planned so far ahead that everything i was doing was always for something else. i wouldn’t do something because i wanted to, i would do it because i wanted what i thought would come next. high school was for college and college was for a job and there’s a funny break here where i left and went to grad school but grad school was for a postdoc and a postdoc was for another job. studying was for grades and research was for papers and papers were for jobs. so yes, right now i still only have a job that will kick me out in 1.6 years (1.59999 now), but i think i’ll be better at studying when it’s just for studying and i’ll do better research when it’s just for research. the scientist in me has to gather more data first before saying that definitively, but for now, i’m sure as hell having a lot more fun.


Thanks for writing this, it clarifies a lot. This piece really hit home, and I appreciate how honest yu are. It feels like you've been learning to define a whole new value function for your life, calibrating it with your own internal happiness metrics instead of external ones. So insightful, and honestly, a relief to read.
I really enjoyed this. It reads like it was written in a couple of passes and left alive, which is rare and nice.
As an ex–five-year-plan immigrant kid, a lot of this hit uncomfortably close haha
I like that you drag the value-function idea over from RL into life. Yes, it collapses a ridiculous amount of nuance into a single number, but that’s also exactly the move most of us end up making without noticing. What stayed with me wasn’t the technical analogy so much as that moment on the train.
I’ve subscribed. I’ve been writing more from a systems-engineering angle about AI and the mess it creates; if you ever wander over, you might find a few echoes of what you’re circling here.