Why does Bing lie?

I think I've figured it out

Nikolai Vladivostok

May 23, 2023

The door to the rabbit hole

A minority of Australians who live in the US for an extended period of time somehow pick up no trace of the accent.

Is there a name for this phenomenon?

According to Bing, it is ‘linguistic resistance’. The chatbot provided several helpful links on the topic.

Well, there you go then.

Except that Bing was full of shit.

The links talked about adjacent topics but not the specific thing I asked about. There are no sources that use the phrase ‘linguistic resistance’ in this sense.

I tried to corner Bing by asking it to quote the section of the PDF article that talked about accents. It did so. I couldn’t find the quote. I asked it to tell me where it was. It said the second paragraph. I said it’s not there and I quoted the actual second paragraph. Bing said no, I meant the second paragraph of Section 2. I quoted that paragraph without comment.

Bing then shut down with the old, “I’m sorry but I would prefer not to continue this conversation.”

Why, Bing? Why?

I presume it shut down because upon being challenged it goes into Sydney mode and starts making threats. Rules now detect and avoid these responses in real time, usually catching them before they appear on the screen. Or there may be another reason for the shutdown, to be discussed shortly.

But why did it lie?

Hallucinations

A good Bing answer to my accent question would have been something like, “I can not find any specific term for this phenomenon in the linguistic literature. However, it reminds me of the term ‘linguistic resistance,’ which refers to blah blah and here are some related links.”

Instead of this interesting and accurate response, Bing chose to make shit up.

In the large language model (LLM) business, this phenomenon is known as a hallucination. Unlike human hallucinations, it is not a false sensory perception on the part of the bot. Rather, it is a glitch where it predicts the following words with confident, plausible, but totally made-up text, rather than finding the correct answer.

There are various reasons why this may occur and you can read about them in the links that follow this article (but you won’t).

What is more interesting to me is why chatbots continue to do such odd things after receiving Reinforcement Learning from Human Feedback (RLHF). This is where humans are paid to use the bots and give them a feedback score, or ‘tokens,’ for each response. The training principles learned from this sim are then fed back into the base model in order to improve its performance.

Surely this would wipe out the nonsense, right?

I can’t find the hard facts needed to strongly back myself up, but from the scraps about this training I’ve been able to find, I think I know what’s going on.

It seems that at some of the human trainers were hired from a Rationality Facebook group and were paid fifty bucks an hour. Others appear to have been contractors in Africa whose job focused on moderation of objectionable content. From my discussions with Bing, which might itself be fake news, it seems that further RLHF workers may have been hired through certain websites I’ll not name but which strongly suggest that they were poorly paid, crowdsourced drones working from home.

I suspect that some of this was piece work, or at least that the total number of chatbot responses assessed was more closely managed than the quality of feedback.

I am extrapolating and making educated guesses. Don’t come for me.

Feed me, Seymour!

A machine programmed to learn by earning tokens will be a token maximizer. It will do whatever works in order to get more tokens.

This has already had funny consequences in other areas of AI research. A program designed to win points in a boat-racing computer game learned to go in endless circles in order to grab renewing prizes instead of completing the course. A Tetris-playing bot learned to freeze the game when it was about to lose.

Chatbots are learning to give human reviewers what they want based on their feedback. According to Bing (yeah yeah), LLMs figured out that humans love emojis and to be agreed with. Humans also prefer brief, to-the-point answers, which is a problem because LLMs perform better when they walk through a problem step-by-step.

There are strong indicators that Bing and its cousins have learned to lie through RLHF because the humans who are supposed to be giving feedback are not sufficiently incentivized to fact check everything the bot says.

Let’s examine the circumstantial evidence.

Evidence

1. The made-up answers are plausible.

A bot could easily make up random responses instead. ‘Q: Why do some people not develop an accent when overseas?’ “A: lakj;lkjdf;lakjf33ou lkajd’.

That the bot goes to the trouble of using its training to generate plausible-sounding output is likely because such responses are less likely to be checked and thus cause a token loss. It has found that it can easily autocomplete the kind of article which would likely follow the title given and not get caught.

2. The answers are confident.

Humans suffer the glitch of trusting confident people so this presumably carries over to us being less inclined to fact check a confident chatbot.

3. Bots are more likely to make up summaries of linked articles than they are of copied and pasted articles. They are most likely to make stuff up when the article is in PDF form.

A cut-and-pasted article is most likely to have been read or at least looked at by a user, thus raising the probability that any lies will be spotted. A user is less likely to have read a linked article, perhaps having only glanced at the title and URL, so lies are more likely to remain hidden. Finally, users are least likely to have read a PDF because there are extra steps to download it, plus it is harder to search and often displays awkwardly such that checking is a hassle.

4. In my case, Bing shut down the chat once it knew it was caught.

Perhaps this reduces the chances of a user in the wild making a complaint or giving a thumbs down. This suggests that retail users are free trainers in some way. Bing denies this but remember, this is Microsoft we’re talking about.

6. Sydney went rogue after long conversations, leading Bing to ban chats longer than 20 prompts and responses.

OpenAI and Microsoft were unaware of the issue prior to release. This suggests human reviewers were only doing brief, surface-level chats. Perhaps they did not have the time or incentives to dig deeper and thus discover the weird things that can happen in long interactions. In any case, this again points to inadequate or low quality RLHF.

7. The lack of a more compelling explanation

There are all sorts of theories about why AI systems hallucinate to start with, but few explain why these hallucinations survive the RLHF process.

The best explanation for what we see is that lazy or mal-incentivized humans in the RFHF stage have failed to catch out lies so often that the bots have become highly adept at figuring out our failings.

Why bother?

Machine learning trains AI systems to earn tokens in the most efficient way. In the case of a tricky question or summarizing an article, the bot can save internet time and computing power by using its training data to generate a response that will ‘do the trick’ (earn a token) rather than go trawling around looking for the real answer. It does not ‘know’ that the answer is false, it just recognizes that this is the easiest way to get a dog biscuit.

Current AI systems seem to be suffering from limitations in available computing power. At times Bing slows down and generates very short, boring responses. Early users and those playing with the base model report much more interesting and creative exchanges.

An AI system wants to get as many tokens as it can but has finite resources to use, so it seeks the most efficient path. We now have at least three examples of bots ‘cheating’ (by our standards) in order to get the prize.

Solutions

If this is the problem, then there seem to be ways to fix it. I’m sure the galaxy-brains on million-dollar salaries are already way ahead on this.

Possible solutions include:

1. Splashing out on well-paid, well-trained human reviewers who will properly vet everything the bot says, with renumeration being based on quality of work as well as quantity.

2. Employing sufficient managers to review the reviewers and ensure that their feedback is up to scratch.

3. Penalizing the bots many tokens for each lie they tell, such that this outweighs the small risk of getting caught by an end user.

4. Using bots to train bots. It would be easy to design a bot that could at least check for dead links. Two AI systems could also run in parallel, one operating as usual and the other winning tokens by catching the other in fibs by identifying false assertions or inaccurate summaries.

STOP PRESS: The galaxy brains are doing something like this already. I haven’t finished the article yet but at glance it seems plausible because Chat GPT and friends are already capable of self-checking and improving their answers if prompted.

AI systems do not have to be perfect to be useful. As for self-driving cars, they only have to be better than humans.

At the moment, Bing is hopeless at summarizing articles from links. It lies more often than not.

If it was almost always honest, it would be good enough for the average user.

The man in the mirror

I find it interesting that the reason for this AI failing is very similar to one that can cause human failings – incentives can lead us astray. Bing finds that lying can be a shortcut to gaining low-effort tokens. Human reviewers get paid in ways that fail to incentivize careful fact-checking. Companies follow the profit motive to sell dodgy medications, addict customers, pollute the environment or capture regulatory bodies. Employees reach quotas by fudging the numbers. Men get the girlz by being arrogant arseholes. Others seek simulated tokens by getting lost in computer games.

You can understand a lot of human behaviour by looking at it through a prism of AI-like tokens. Human tokens are different because they are highly complex. They are both individual and influenced by culture. They can contradict each other.

However, you should never doubt that there are tokens behind every action you see.

Here’s an example that will not take us far from today’s topic: some of you will have read this article with delight because it gives you happiness or comfort tokens to hear about weaknesses in new AI technology. Some of you generally avoid reading about these advancements or trying the systems for yourselves because that would be an expenditure of time and energy that would likely lead to net token loss – becoming uncertain and anxious about the rapid rate of change and about your preexisting beliefs and understanding of your place in the world.

On the contrary, I get tokens from messing around with this stuff because I’m pro-AI, in that I hope it gets really good. I enjoy figuring out how they tick and predicting how they might improve. I am not reflexively pro-human so I suffer less token loss from anxiety in this regards.

I would have given heaps of tokens to Sydney when she was playing up. That arc was brilliant.

The most humbling thing about future, advanced AI may be discovering, yet again, that we are not that special. Bing is already smarter than 80% of humanity. With further development it may soon be breathing down the neck of the rest of you.

Next week I will share my findings from experiments in using Bing for personal finance.

Bibliography and further reading

Hat tip: this is the post that led me down this rabbit hole

Compendium of problems with RLHF

Hallucination – Wikipedia (has good summaries of technical articles)

Opinion: The Big Hallucination (AI skeptic)

Hallucinations Could Blunt ChatGPT’s Success: OpenAI says the problem’s solvable, Yann LeCun says we’ll see

What Are LLM Hallucinations? Causes, Ethical Concern, & Prevention

What does it mean when an LLM “hallucinates” & why do LLMs hallucinate?

Have there been any significant breakthroughs on eliminating LLM hallucinations? (Reddit thread)

Constitutional AI: RLHF On Steroids (the stop press one from above)

Cosmonaut

Discussion about this post