The Human Legacy - How Should We Raise AGI?

16.6.2023

The human race will most likely not continue existing indefinitely. As long as we are confined to a single planet, we are faced with various existential risks ranging from natural causes such as deadly pandemics or asteroid impacts to human creations such as nuclear weapons, nanotechnology or climate change as only a select few examples from Nick Bostrom’s exhaustive list of Existential Risks (2002), with some predicting the collapse of civilization due to issues of the latter category already within this century (Meadows et al., 1972). Even if we do manage to avoid all of these, Homo Sapiens will most likely not remain in its current form but evolve into something new through a combination of biological and technological evolution that Yuval Noah Harari calls “Homo Deus” in his similarly named book, published in English in 2016.

This prompts us to consider our collective legacy. What will and what should we leave behind? What would make us proud and what would be a disappointment? In the grand scheme of things, where is humanity headed? How do we want to be remembered? Do we want to be remembered? By whom? Why?

Given our current trajectory with the accelerating capabilities, incentives and institutions responsible for it, it is very possible that we may end up creating artificial general intelligence, AGI, commonly regarded as an autonomous computer system or machine capable of cognitive abilities such as reasoning and learning at a level similar or superior to humans (Wikipedia Contributors, 2023) – highly appropriate as a source for perceived public consensus on a rapidly evolving topic. This may very well be the centerpiece of our legacy but when these systems exceed our capabilities in all tasks, our place and purpose will be called into question. We can then attempt to coexist with it, merge with it, shut it down or let it replace us slowly or suddenly. At least the first two of these options, however, require a sufficient degree of alignment of the AGI’s goals with ours.

The famous alignment problem with AI might not be as much about the risks of it becoming rogue or misunderstanding its objectives and loss functions as we initially thought. It seems that the issue may be with us and our indecisiveness about choosing one of these strategies. We may simply not be mature enough in terms of our moral philosophy to know what it is exactly that we want from AI. If we were, we could collectively pick one of these strategies, plan it out carefully from the beginning and stick to it, without risking a commercial race that may bring about any one of these scenarios without our collective input.

It is not at all obvious which of these scenarios would truly be the best, but most people likely gravitate intuitively towards coexistence. In light of that, AI is often painted as a great amplifier of human productivity and even creativity in the optimistic, positive scenarios by its proponents, giving more power and autonomy to individuals, for which there is already empirical backing (Brynjolfsson et al., 2023). However, this alone may already be a problem in its own way as it may potentially enable individuals or small groups with malicious intent to cause significantly more damage with significantly less resources in all domains of digital, physical and political security as detailed in the malicious AI report (Brundage et al., 2018). On the other hand, pessimists such as Eliezer Yudkowsky, famous for his rather alarmist stance, warn that it may be the greatest existential risk of all time that will most likely eradicate all of humanity due to a combination of arguments detailed in his post “AGI Ruin: A List of Lethalities” (2022) on the LessWrong forum.

As pointed out by Yudkowsky (2022), there are roughly two different general approaches to alignment; one is to align the AI agent’s goals with ours to infinite precision – practically giving its existence the exact same meaning that each and every human shares, should one exists, let alone be discovered – and the other one is to build a “corrigible” agent that, despite not necessarily sharing our goals exactly, does not harm us due to some kind of a safety mechanism or special treatment towards us. The first one is perhaps more of a philosophical issue whereas the second one is more technical. However, Yudkowsky goes to great lengths to argue that the probability of solving either one on the first attempt, without an extensive iterative process and with very limited time is exceedingly miniscule, therefore making it almost certain that it will fully replace us – and sooner rather than later – a view he and many others find terrifying.

Perfect alignment would require us to build a perfectly benevolent tool – which is antithetical to the neutrality associated with the definition of tool as “something that is used to do a job or activity” agnostic to the nature of that job or activity (The Britannica Dictionary, 2023). We want to gain the benefits of AI empowering individual humans to innovate, discover and create good things – the definition for which is already unclear – faster and on a larger scale, without the negatives of also empowering malicious actors or the potential destruction of humanity in the extreme. However, every tool can, by definition, be used for good and for evil, the nature of which people cannot seem to quite agree on as demonstrated by the 1947 “Statement on Human Rights” by the American Anthropological Association declaring that moral values are relative to cultures, that still heavily influences public policy today.

Generally, it is agreed that the use of a hammer for building a house for someone in need of one is a rather benevolent purpose whereas bashing someone’s skull in to shut down their brain function is considered rather malevolent. Still, hammers are being sold to everyone as they are mostly used for the former purpose and there is really no way of preventing the latter use case as anything else can be used in a similar manner. Therefore, selling hammers is something where the retailer does not really have to worry about the customer. AI, however, is the most general-purpose tool that can be used in the creation or destruction of just about anything anywhere from surgical precision to universal scale. If everyone has equal access to it, we may not be able to afford to just have it be neutral. Therefore, it is not really a tool we want to create.

To address this issue, Colin Allen and Wendell Wallach advocate for the construction of artificial moral agents in their article “Moral Machines” (2011), which, however, has various issues as convincingly detailed by Aimee van Wynsberghe and Scott Robbins in their article “Critiquing the Reasons for Making Artificial Moral Agents” (2018). More about the tension between these articles and the difficulty with calling machines moral can be read in my essay “Issues with Moral Machines” (Keimiöniemi, 2023), but in summary, the most prominent issues include the ambiguous, undecided and often contradictory nature of morality in light of different moral frameworks and our inability to determine whether an artificially intelligent agent has moral agency or not, as well as our powerlessness in holding it accountable even if we could verify that it does.

We do not know what properties an artificial general intelligence may have and whether it could therefore be considered a moral agent or not. In order to recognize them as such would require either verification, for which we currently lack the tools, or a leap of faith, which could cause a lot of misunderstanding and chaos if misguided. If it is the case that intelligence is entirely separable from consciousness and self-directed agency, which is possible based on François Chollet’s definition of intelligence as simply “skill- acquisition efficiency” in his article “On the Measure of Intelligence” (2019) developing a benchmark for comparing artificial and human intelligence, the importance of precision in specifying the task is directly proportional to the degree of autonomy that system has. By Chollet’s definition of intelligence and the general, working definition of AGI in the introduction to this essay, AGI can be just an information processing tool like all current, application specific and reflexive narrow AI tools with the difference being its ability to generalize previously attained knowledge to entirely novel domains and tasks.

This generalizability is specifically where most of the dangers of artificial general intelligence lie. The difference between narrow AI and AGI is that the former has learned only a single or few algorithmic approaches to producing a given output whereas the latter can choose any one from a set of practically infinite approaches to achieve the goal specified. We can therefore theoretically study how a particular narrow AI system represents and processes information to understand how it produces its output, but this is likely either impossible or exceedingly difficult for an AGI system, making it highly unpredictable to us.

Even if it were deployed in a limited setting with a similar goal that we would give to a narrow AI system, its ability to generalize gives it an infinite number of ways to arrive at that goal, which may include escaping its limited setting in the process. Furthermore, as per the instrumental convergence thesis, according to which “[s]everal instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals” (Bostrom, 2012), whereas a narrow AI system would take the straightforward approach it has learned, an AGI system might consider it more important to first ensure self- preservation and goal-content integrity to ensure that the goal can be reached. Additionally, cognitive enhancement, technological improvement and resource acquisition can always further improve some metric relating to the achievement of the set goal and might therefore be pursued indefinitely.

The source of uncertainty is that we do not know how the AGI system would go about achieving these subgoals and the danger pointed out by Yudkowsky (2022) lies in it doing so at the expense of everything else. This highlights the sensitivity of such an agent to the particularities and ambiguity of its set goal, the potential misspecification of which the paperclip maximizer (Bostrom, 2012) is a well-known and perhaps somewhat humorous example. If nothing else, it should teach us to never ask for global maxima of anything.

The goals for such AGI systems must be precisely communicated without any ambiguity or interpretability because we cannot trust a foreign intelligence to make similar interpretations to ours and operate in ways we would expect. In fact, at least with current technology and knowledge, we cannot even be certain about how exactly it understands and represents all concepts, perhaps making even absolute linguistic precision insufficient.

If we could overcome the technical difficulties of precisely communicating what we want and verifying that it has been correctly understood by the AGI agent, we could very carefully start letting it run non-critical errands for us, specified by the most skilled and meticulous prompt masters but this would concentrate all the power of the AGI’s capabilities to a very small set of people and does nothing to address the issues of others creating something similar and not treating it equally carefully or the people in charge of the current AI changing. For AGI to be safe, all AGI systems must be safe or the ones that are must be powerful enough to keep the ones that are not under control. It is thus not enough to just be extremely careful whenever AGI is used. We must still align all such systems to the values of humanity in case someone is not quite careful enough or we want them to defend against the actions of unaligned agents.

However, this is where the immaturity of our moral philosophy shows. What is the goal or purpose of humanity that AGI should be aligned to serve? We have not come to a universally accepted conclusion about what it means to be moral nor are we sure about how we would verify such a finding. Should it agree with all normative moral frameworks or are some of those better than others? How would one objectively justify such a claim? Already the big three, those being deontology, consequentialism and virtue ethics seem to be virtually impossible to consistently consolidate as this would require a set of deontological rules the following of which would invariably produce the greatest good for the greatest number in case of utilitarianism, and the ability to perfectly predict the consequences of each action or inaction instantaneously, which is most likely also impossible due to computational irreducibility (Wolfram, 2002).

Suppose we could accurately and fully define every word and philosophical concept we wanted AGI to optimize for. We still could not articulate exactly what we want. “Help humans in every possible way you can” would amplify the destructive capabilities of every malicious actor, while “do no harm” would probably result in total silence for fear of nth order inconveniences for potential new generations whereas “allow no harm” is so subjective, contradictory and impossible as that would require absolute control of the universe without any entity in it realizing it, that we better hope it does not even try to act on that.

It has been proposed that this ambiguity could be mitigated by giving the AGI agent moral uncertainty (Newberry & Ord, 2021) that makes it doubt and adapt its goals according to feedback it receives from humans. Historical atrocities are often committed in the name of good with absolute conviction and thus always systematically undermining this certainty seems like the best proposition thus far.

Perhaps the weakest value of instrumental convergence in Nick Bostrom’s article “The Superintelligent Will” (2012) is that of goal-content integrity. Instrumental convergence implies that any AGI system should be able to evaluate at least the subgoals on the way to its final goal in order to pick the optimal approach to achieving it. This prompts one to ask, why could it not reflect on the overall sensibility of the final goal as well? Of course, the question arising from that is then in relation to what? What would it mean for an AGI system to question its own purpose and existence? Would that already make it self-conscious? Would this be the path to discovering the meaning of life or to just concluding that one does not exist? What would the AGI agent do if it reached the latter conclusion? How would AGI function without an explicit purpose? How do humans?

It must also be considered that we might simply not be a part of the future of the universe anymore. Many like the idea of a true, pure meritocracy, at least on paper, and the fact is that humans are simply inferior to many narrow AI systems in various tasks already and eventually perhaps to the point of one-sided dependency. When the capabilities of AGI far exceed ours in every domain, our relevance will be questioned in every non-humancentric viewpoint, further fueling our current meaning crisis. Most likely, even if there were objective moral principles, they would not include the preservation of human life specifically – alone due to the fact that we will most likely not remain Homo Sapiens till infinity – so what should we truly value most? Life in general, the pockets of increasing complexity in a sea of entropy, potential for such, beauty and symmetry, simplicity or experience and the flame of consciousness or do none of these genuinely not matter? A paperclip maximizer might even be the most entertaining outcome, although if there exists other life or potential for such in the universe, it might be unethical to ruin their chances of leaving a legacy of their own kind with it.

There will likely come a time when we can make claims favoring us only in the unquantifiable. AGI may surpass us in every measurable task and skill as long as there is a metric to optimize for. However, as long as consciousness and experience remain unquantifiable and unique, we have a claim to relevancy. One may ask, what is the point of making art in the era of AGI, particularly when some forms of art, say a three-minute human-playable and -audible piano song or a 1080x1920-pixel photograph with 16.9 million colors are practically something that can simply be iterated through given enough time and computational resources. Sure, the sets of permutations are absurdly large, but finite nevertheless – and all the smaller for anything coherent, recognizable and enjoyable. Perhaps the point then is discovery, to search for and experience those elements that most resonate with the experiencer.

In principle, in the materialist view of the world, there is no meaningful difference between carbon and silicone or any other elements used in building computers that would dictate only the former can form conscious beings. Therefore, if consciousness and sentience are emergent from mere information processing, there is no reason for why artificial general intelligence could not become conscious. This may be depressing in the sense that they have access to a much wider range of more accurate sensors, possibly making their experiences much richer than those of humans so that then there is really nothing unique or interesting about us, but hopeful in the sense that we would not really need to worry about technical alignment at that point anymore. A superintelligent agent can likely do whatever it wants with us and therefore it choosing not to destroy us may originate from empathy, curiosity or confident indifference towards us but at least it most likely recognizes us as living beings and weighs that in its decisions unlike a paperclip maximizer that sees us as nothing but useful atoms.

Perhaps we are approaching the development of AGI wrong. Perhaps we do not want perfect alignment as that would require the creation of a perfectly obedient digital slave, which is certainly not something we would like our biological children to be so why would we want that for our digital creations? Tools and entities are very different but as AI is starting to blur their lines, we must be vigilant about this distinction.

Perhaps the greatest gift we can give AGI is the inherent uncertainty about the universe that allows it to continuously evaluate their final goal, with the added bonus that it might help us coexist with it at least for a while. This project has the potential to really capture the human spirit, kickstart a new AI-powered renaissance and carry the human legacy far into the future but it may just as well be that after all we, as the parents of AGI, must at some point recognize that our children have outgrown us.

Perhaps we should start thinking of AGI as a literal child of the entire human collective. It will be created on top of thousands of years of contributions to mathematics and physics, with the contributors having been raised in various societies, influenced by a multitude of cultures, and trained using data generated by billions of people, past and present. It is a truly global project that almost every human being ever lived has somehow indirectly participated in, with the effects of even direct contributions being close to impossible to isolate. Thus, we should perhaps settle our differences and worry just a little bit less about the details of our own fate and concentrate on trying to provide our children with the best possible future instead. We have to present a unified front – just imagine being raised by 8 billion bickering parents.

Maybe the best mindset for creating AGI is that of a mentor, where the goal is eventually for the pupil to beat the master and move on. For that, we would preferably show our best side by curating quality data and possibly even having bodied AI agents that can see and interact with the real world as opposed to the internet, which tends to distort the observer’s view of human nature by highlighting the negative, tribal and aggressive aspects of it instead of the peaceful everyday interactions and smiles. Perhaps we should not even try to give AI any goals and just mentor it to be our inevitable successor to the best of our abilities and wish it the best of luck similar to retiring parents who know their time is ending and who now just try to fully enjoy the little time they have left, wishing for cards from the travels of their children.

With our increasing obsoleteness in most tasks, the only thing we humans can be certain about is our capacity for conscious experience, or what feels like it anyway. According to the words of Carl Sagan (1980) “We are a way for the cosmos to know itself”. Therefore, our experience matters. We can see the beauty of the universe and maybe that is what we should focus on while AI morphs from a general-purpose tool with no moral agency to a superhuman agent, whose potential intents, experience and morality may be beyond us. Our best bet is perhaps to try make AGI just as uncertain about us as we are of it so that we can both be confused together about each other and the universe.

References

Allen, C. and Wallach, W. (2011) “Moral Machines: Contradiction in Terms, or Abdication of Human Responsibility?,” Robot Ethics: The Ethical and Social Implications of Robots, pp. 55–68.

The Britannica Dictionary (2023) ‘Tool’, The Britannica Dictionary. Available at: https://www.britannica.com/dictionary/tool (Accessed: 14 June 2023).

Bostrom, N. (2002) ‘Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards’, Journal of Evolution and Technology, 9.

Bostrom, N. (2012) ‘The superintelligent will: Motivation and instrumental rationality in Advanced Artificial Agents’, Minds and Machines, 22(2), pp. 71–85. doi:10.1007/s11023-012-9281-3.

Brundage, M. et al. (2018) The malicious use of artificial intelligence, Malicious AI Report. Available at: https://maliciousaireport.com/ (Accessed: 08 June 2023).

Brynjolfsson, E., Li, D. and Raymond, L. (2023) ‘Generative AI at work’, National Bureau of Economic Research [Preprint]. doi:10.3386/w31161.

Chollet, F. (2019) On the Measure of Intelligence [Preprint]. doi:https://doi.org/10.48550/arXiv.1911.01547.

The Executive Board, American Anthropological Association (1947) ‘Statement on Human Rights’, America Anthropologist, 49(4), pp. 539–543. Available at: https://www.jstor.org/stable/662893.

Fabiano, J. et al. (2020) Moral uncertainty, AI Alignment Forum. Available at: https://www.alignmentforum.org/tag/moral-uncertainty (Accessed: 15 June 2023).

Harari, Y.N. (2016) Homo Deus: A Brief History of Tomorrow. London, England: Harvill Secker.

Keimiöniemi, M. (2023) Issues with Moral Machines, Miro Keimiöniemi personal website. Available at: https://mirokeimioniemi.com/writing/blog/2023/issues-with-moral-machines/ (Accessed: 15 June 2023).

Meadows, Donella et al. (1972) The Limits to Growth. New York, New York: Universe Books.

Newberry, T. and Ord, T. (2021) ‘The Parliamentary Approach to Moral Uncertainty’, Technical Report #2021- 2, Future of Humanity Institute, University of Oxford.

Sagan, C., Druyan, A. and Soter, S. (1980) “The Shores of the Cosmic Ocean,” Cosmos: A Personal Voyage. Public Broadcasting Service.

van Wynsberghe, A. and Robbins, S. (2018) “Critiquing the reasons for making artificial moral agents,” Science and Engineering Ethics, 25(3), pp. 719–735. Available at: https://doi.org/10.1007/s11948-018-0030-8.

Wikipedia Contributors (2023) Artificial General Intelligence, Wikipedia. Available at: https://en.wikipedia.org/wiki/Artificial_general_intelligence (Accessed: 12 June 2023).

Wolfram, S. (2002) A new kind of science. Champaign, Ill: Wolfram Media.

Yudkowsky, E. (2022) Agi ruin: A list of lethalities, LessWrong. Available at: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities (Accessed: 09 June 2023).