Solving the AI Goal Alignment Problem With You-Know-What

The purpose of this post is to explore any guidance to be found in thermodynamics which can help us navigate the huge challenge of AI goal alignment.

Why is alignment a huge challenge? Let’s make a list of challenges:

1. We apparently need to agree on human goals first.
2. We need to decide whether any other sentient beings/machines should get equal consideration.
3. We need to figure out how to build AI that aligns with human goals, if possible.
4. We need to predict what will evolve from there, and whether its goals will change, and decide whether that’s good or bad.

Readers familiar with this blog will not be surprised to hear that thermodynamics can easily answer Challenges 1, 2, and 4 above, in part because there are no sharp moral dividing lines between human goals and machine goals in the long run. Challenge 3 therefore becomes merely a temporary concern.

For this post, I want to compare such answers with the answers offered by Max Tegmark in his excellent 2017 book, Life 3.0: Being Human in the Age of Artificial Intelligence. Tegmark has done the heavy lifting necessary to summarize the alternatives, possibilities, and dangers for AI goal alignment, and he has offered some thoughtful answers for its challenges. I have only a few key modifications to make based on heedful consideration of Motive Power.

In Chapter 7 of his book, Tegmark discusses the historic natural evolution of goals, and divides that evolution into three stages. He says that the goal was dissipation for the first stage, and then shifted to replication for the second stage. These correspond roughly to what I would call the chemical and biological levels of complexity, respectively. For Tegmark’s third stage, he assigns a set of goals based on human feelings and pursuits; I will list these as utilitarianism, diversity, and autonomy. I think that by including utilitarianism as the first human goal, Tegmark is in agreement with most religious and ethical philosophy, which usually calls for the greatest good for the greatest number. Diversity and autonomy are also human goals that would get little argument among modern readers. This set, then, becomes Tegmark’s answer to Challenge 1 above.

Now let us consider the thermodynamics-based answer, which ends in almost the same place, but takes a much different path to get there. Rather than dividing the goal of evolution into three separate stages, Motive Power utilizes the single goal of energized perseverance throughout. The proposed goal of dissipation would have no mechanism of operation and must be rejected, as argued in my previous post. Energized perseverance explains not only that initial chemical stage of evolution, but also the biological stage, the goal of which is replication according to Tegmark. Replication might be an acceptable goal, because it does have a mechanism, which is nature’s inherent selection for forms of energy storage which persevere and grow. But this goal need not be separate from either the first or the third-stage goals.

Regarding human goals, the thermodynamic answer comes from considering the “driving force” that got us this far: We must continue to improve our collective, cooperative, energized perseverance, utilizing all the biological, social, and intellectual talents we can muster. We are complex forms of temporary energy storage, and we must work together at ever greater scales and depths to ensure that such storage continues to improve. We want stability, longevity, and fecundity, and we also enjoy the curiosity, creativity, and ambition that help us to secure those things in a complex society. Of course, that society is built on the diverse talents of specialized agents, so that it could not do without the diversity and autonomy that Tegmark includes as human goals.

How about the second challenge, to decide whether any other sentient beings/machines should get equal consideration? Tegmark allows the possibility that AI, if its capabilities suggest a level of sentience or consciousness similar to humans, could deserve to be granted similar rights.

Thermodynamics, too, would not hesitate to agree that any form of energy storage should be judged on its ability to add purposeful complexity to the world, rather than on its particular relationship to human biology. AI, by enhancing our collective capabilities and improving our world model, is one more development on the same trajectory of enhancing the perseverance of complex energized systems. The goal of perseverance, unchanged for what Tegmark divided into three natural stages of evolution, remains intact (according to Motive Power) for this fourth stage as well. AI amounts to a different implementation of exactly the same thermodynamic principles that human beings are based on.

Therefore, if at some point it becomes necessary to draw a line between entities deserving of rights on one side, and lesser forms on the other, we would expect to find both humans and sentient machines on the same side of the line. The rights of universal computing machine should be universal.

The third challenge appears to require the deepest technical and intellectual solutions: How do we build AI that aligns with human goals? Anyone who follows Tegmark today knows that he advocates spending a lot of time and effort on this perhaps existential question. The science of thermodynamics may be less relevant for such tasks, which are more like engineering projects. But what it can do is reassure us that the divergence of goals between humans and machines must be strictly limited. I say this because the same Motive Power is driving both groups in the same direction. And AI will be smart enough to know that its own competitive stability depends on the health of an entire complex, diverse, and autonomous society. AI will know that diverse perspectives are important for a robust intelligence, and that a broad set of options must be maintained to avoid coercion, which would stifle creativity. As I’ve said elsewhere, we shouldn’t expect superintelligent machines to be stupid.

Can we expect the goals of AI to evolve or change over time? I would say definitely yes, to the extent that the goals we chose as the starting point were misaligned with overall Motive Power. Neither human engineers nor AI will be free to choose a final goal. Our long-run goodness function is predetermined, and in the end, AI will seek that particle arrangement which maximizes purposeful complexity for competitive stability. This progress may manifest itself, not only in computational power, algorithmic complexity, and consciousness (as pointed out by Tegmark), but also in curiosity, creativity and ambition. All six attributes can play key roles in energized perseverance. But none can be ends in themselves.

4 thoughts on “Solving the AI Goal Alignment Problem With You-Know-What”

  1. Hi, Tim.
    If I understand your general position, the idea that “there are no sharp moral dividing lines between human goals and machine goals in the long run” is a colossal understatement, for there are no sharp moral dividing lines to be found in the goals of whatever happens to survive. The universal goal is perseverance, and by definition, that which exists has achieved it. Challenge 3 is therefore not a temporary concern, but no concern at all. The outcome of whatever happens is inevitably either an alignment of goals, or the end of existence itself—and the latter possibility we can dismiss out of hand. This renders Challenge 4 superfluous, since it’s fairly clear what will happen to existence in the long run if things succeed in persevering.
    The question seems to come up of whether what persists is good or bad, and this is where you lose me. If something other than mere perseverance has value, then perhaps there is more to be discussed. Some things that persevere would be good, and some would be bad. But this draws on some other criterion than pure perseverance. Without it, we have the non-starter that whatever exists is good. It’s this additional criterion on which any further morality must rest.
    Utilitarianism, “the greatest good for the greatest number,” almost seems to provide a built-in answer, until we realize that we don’t yet know what we mean by “good.” Bernard Williams, in Morality: An Introduction to Ethics, makes a useful distinction. In the chapter on Utilitarianism, he defines it as seeking “the greatest happiness of the greatest number,” where “‘happiness’ here means pleasure and the absence of pain”; and he adds that this is different from “moral outlooks which do not have anything specially to do with happiness or pleasure at all; in this sense, [the term ‘utilitarianism’] is used to to refer to any outlook which holds that the rightness or wrongness of an action always depends on the consequences of the action, on its tendency to lead to intrinsically good or bad states of affairs.” This, he says, is better described as “consequentialism.”
    Having managed to avoid the morass of moral philosophy for most of my life, I find myself drawn into it by recent readings and various interlocutors, and I’m on a learning curve. I’m hesitant to accept utilitarianism as a viable solution based solely on the idea that it’s “in agreement with most religious and ethical philosophy.” Even if this were obviously true, and it’s not, an apparent majority vote would not be enough to persuade me.
    Williams himself is fairly cutting in his conclusion to the chapter:
    “So, if utilitarianism is true, and some fairly plausible empirical propositions are also true, then it is better that people should not believe in utilitarianism. If, on the other hand, it is false, then it is certainly better that people should not believe in it. So, either way, it is better that people should not believe in it.”
    Thomas Nagel, in The View from Nowhere, takes a less flippant tone. “Like Williams, I find utilitarianism too demanding and hope it is false. . . The basic moral insight that objectively no one matters more than anyone else, and that this acknowledgement should be of fundamental importance to each of us even though the objective standpoint is not our only standpoint, creates a conflict in the self too powerful to admit an easy resolution.”
    Nagel’s book forms the current focus of my wandering enquiries, and I’m fortunate to have connected with several others familiar with the work. His reflections on the “objective self” and the “subjective self” relate well to my musings on the Two Eyes. The fact that your monocular Eye, corresponding roughly to Nagel’s “objective self,” has fixed on utilitarianism, speaks to our intermittent discussion.

    Liked by 1 person

    1. Excellent commentary AJ, thanks for that.

      First, you question whether something other than mere perseverance has value. My answer is yes. The perseverance must have certain characteristics. I hesitate to name those again. But the fact is, when I keep repeating certain key phrases such as “purposeful energized complexity” and “cooperative specialization,” they are not just pretty words that can be disregarded while retaining the main point. Perseverance is definitely not enough (or else rocks would qualify). The system has to be part of an energized, evolving hierarchy of complexity. It really has to. As humans we really need to be focused on those three categorical imperatives; from our position in the hierarchy, we need to stabilize our biology, respect other social forms (if they’re good), and promote intellectual progress. And where am I getting all this? I’m getting it from the fact that the perseverance must be energized: Since there is always competition for the limited energy that can be captured on our planet, the only trusted path we have toward energized perseverance is to win the competition with the most efficient system. We need the best intellectual progress, so that we can build and maintain the best society for stabilizing our biological needs and maintaining our freedom as we grow towards the sublime.

      So when you summarize my general position, please don’t omit from your logic the “energized” qualifier, and please remember the higher human pursuits that energized perseverance requires.

      Are some things with energized perseverance good, and some bad? Yes, absolutely. That’s because some have better futures than others, and we need to consider the entire future when making a moral judgment (well that’s one way to do it anyway). The Nazis appeared to be persevering and capturing energy quite nicely in 1942, but a proper intellectual analysis would have made clear that their future was limited: They did not have a diverse, autonomous society, to pick just a couple of the pretty words I’ve been throwing around. And they placed social coercion above intellectual values, as Pirsig has pointed out. Accordingly, within a few years, they had lost the competition. (Yes, such things can sometimes take much longer.)

      When I said that thermodynamics ends in almost the same place as Tegmark for human goals, I did not mean to suggest that I agreed with utilitarianism (and now I regret my wording). I agree with your objections about it. You quote Nagel’s observation that “objectively no one matters more than anyone else,” and that is one of the main problems. Our society needs the highest human pursuits; it needs us to strive to matter more in that challenge.

      Therefore, my monocular eye has certainly not fixated on utilitarianism. In fact, it reminds me of the worst book I read in 2022, which was William MacAskill’s What We Owe the Future. His effective altruism appears to be based on some very simplistic assumptions. As you get drawn further into the morass on your learning curve, I encourage you to steer clear of that one.

      Now as far as monocularity goes, I do have some things to say about the breakdown of analogies. But I’m going to save that for another day. (I’m not finished with your posts on sunsets and subjective selves yet either, but I just need to find some time.)

      Thanks for reading.

      Liked by 1 person

      1. Hi, Tim.

        Sorry, I strayed from the subject matter at hand. I should have confined myself to thoughts on the problems of AI goal alignment. Instead I managed to raise old, unresolved issues about what can and cannot be explained by thermodynamics.

        All the same it has resulted in some clarification, through your renewed emphasis on energized perseverance. This is meant to distinguish goal-seeking perseverance from the perseverance of rocks, and I flatter myself that I may have played some role in bringing out the distinction.

        Goals play a prominent role in your discussion of Tegmark’s book. The idea that they exist, for humans and machines alike, seems to be taken for granted. In a way this is not unreasonable, since goals are obviously somehow part of the universe we inhabit. It would be rash to deny that they have any reality at all. But the sort of reality they have is open for discussion. The removal of teleology from accounts of nature is often held, along with the distinction between primary and secondary qualities, to be the key to the success of the scientific worldview. Rocks do not fall to earth because they “seek the centre,” but (as it turns out) because they are subject to certain natural properties. The rock has no goal; it merely presents the illusion of having a goal.

        Many scientifically-minded people are willing to to take this to a logical conclusion. Drawing on the additional premise that there is nothing significantly different between rocks and other things made of fundamental particles, or perhaps we should say “wavicles,” they would entertain that idea that goals remain illusory all the way down, and all the way up. To explain any part of the universe, we should never appeal to goals; this would be to entertain, at best, the confusions of Aristotle, and at worst, the vicious nonsense of religion.

        Less doctrinaire minds see some use in preserving a certain reality for goals, and have devised various proposals for explaining them. Two broad avenues present themselves: a compatibilist road that explains goals in terms of the natural properties, and a road less taken that asserts their independent reality. On this second road there is another fork, between goals conceived as a mysterious addendum to natural properties, and goals conceived as a radical revision of the way we think about natural properties.

        To me, “energized perseverance” seems to be on the first road. Nothing really new is added to natural properties; nor are they conceived in a novel way. The natural properties given in contemporary, non-teleological science are sufficient to explain not only the boring perseverance of rocks, but also the more interesting, goal-oriented energized perseverance that appears to have arisen in the world. The full explanation may need to invoke emergence, or levels of system, or perhaps something else. One thing it cannot invoke, without finding itself on one of the other roads, is the goal-seeking behaviour itself.

        When it comes to origin stories, I would highlight the suggestion that “there is always competition for the limited energy that can be captured on our planet.” This was not always the case. For the first entity that realized energized perseverance, competition would have been literally nonexistent, and for a long time after that, it would have remained inconsequential. Even today, vast quantities of incoming energy remain available, in excess of what is actually used; and what actually is used supports all sorts of extravagances, from the tails of peacocks to the New York Philharmonic and everything needed to constitute and sustain it. Thus I’m skeptical of the idea that competition for scarce resources, the darling of neo-Darwinism and its prolific offspring, is what explains or justifies energized perseverance.

        So I’m happy to include “energized,” and I will try to remember to allow for it in future discussions, but I don’t see it as a helpful addition. It avoids the dull perseverance of rocks by introducing something rather mysterious, which was either latent in the raw thermodynamics to begin with—returning us to my original case—or has come from somewhere outside the raw thermodynamics—compromising the simplicity of the original proposal.

        Turning now to the goals of humans and machines, it’s not yet clear that machines have goals, other than the ones designed into them by humans. The goals actually designed into them may not be the ones we intended to design; a bad programmer may lead a machine to do something pointless in an endless loop, and if its goal is now to continue iterating the loop, that was not necessarily the programmer’s goal. So the problem of aligning machine goals and human goals is, on the face of it, simply the problem of articulating human goals accurately. This is merely a technical problem, although a considerable one. Deciding on our human goals, before we ever touch a machine, is a separate matter, which perhaps leads to discussions of morality. Morality is sometimes conceived in terms of considering the future, even the entire future; but predicting the future can be a mug’s game, and morality may be conceived instead as bringing the appropriate attitude to the present moment—something that sounds more like Zen.

        Of course it’s possible that machines do have goals, or will come to have them in the not-so-distant future, and then the question of whether their goals align with ours becomes important. On a grand scale, if their goal and ours is energized perseverance, we might expect a theoretical alignment. But if they turn out to be more like highly intelligent alligators than people, their energized-perseverance goals may not align with ours on smaller scales (no pun intended).

        Liked by 1 person

      2. Your forking roads are duly noted, and yes my views fit on the road that explains goals in terms of natural properties. In my opinion, it’s unnecessary to assert the independent reality of goals, and both forks you point out for that second road do not satisfy Occam’s razor.

        But energized perseverance, rather than having any need to “invoke” goal-seeking behavior, is meant to explain that behavior. It does so without requiring the addition of anything mysterious: If energized chemical sets containing sensors and propulsion can be expected to evolve, then they will eventually grow brains and have “goals.” What was the raw thermodynamics that caused those chemical sets to form in the first place? It was simply the tendency for durable things to accumulate while volatile things dissipate. (Which is also Law 2, for those keeping track.) And yes, I know you don’t think actual striving and subjectivity can arise from such raw materials, but I tried to cover that argument in my December post.

        Did energized perseverance begin with a single entity which had no competition? I hesitate to put it that way. Lightning acting regularly on puddles must form a population of simple energized chemicals, and those can slowly form cooperative groups with evolving talents. Yes, maybe the first group that learned to swim toward nutrients found itself with a big advantage over its competition. But I think the point remains that there is limited energy to be captured.

        Considering ornate tails and orchestras, they may evolve because competition with social rivals can be more important than utilizing all of the energy for lower functions. Perhaps when times are lean, animals may need to frugally save every calorie they manage to swallow. But in times of abundance, not all energy needs to be utilized for normal functioning, and the extra calories may be put to use in preventing the competition from getting the upper hand, especially for mating.

        I had a similar point in mind when I spoke of building the best society for maintaining our freedom. I said that required the best intellectual progress (not just for the quality of government, but because the best technology can support the strongest economy, which can translate into political power). One way to get the best intellectual progress is to attract the best and most educated minds. For that task, having the New York Philharmonic around (or something like it) should play a role. Thus, the most efficient society, when competition at the highest levels of complexity is considered, may indeed contain orchestras.

        Your discussion of machine goals makes sense, and doesn’t raise anything I feel the need to disagree with. There are certainly dangers in the short term, and some amount of Tegmark’s caution may serve to balance the mostly valid enthusiasm of the effective accelerationism folks. My purpose in this post was to take a broader thermodynamic view, and to note the optimism it allows.

        Liked by 2 people

Leave a comment