🧐 OpenAI models fail own benchmark test?

🫱🏻‍🫲🏻 Meta partners with US Government

AITR banner

TOGETHER WITH INNOVATING WITH AI

Welcome to AI Tool Report!

Thursday’s top story: OpenAI has released a new benchmark test to assess the factual accuracy of AI models, and found their own models lacking.

❗If you want to become irreplaceable by AI, we suggest you read: "IRREPLACEABLE: the Three Competencies of the Future by Pascal Bornet

🌤️ This Morning on AI Tool Report

  1. 🧐 OpenAI models fail own benchmark test?

  2. 💼 How to become an AI Consultant

  3. 🫱🏻‍🫲🏻 Meta partners with US Government

  4. 🤖 How to build AI chatbots that understand your business

  5. 💬 How to give a motivational speech using ChatGPT

  6. 🧠 Google making robotaxis smarter?

Read Time: 5 minutes

FACT OF THE DAY

🤔 According to a study by Pegasystems Inc, just 34% of consumers realize they’re using AI, but when surveyed about the technologies they use, 84% actually use one or more AI-powered devices or services.

STOCK MARKETS

AI stock tracker

👀 Keep track of what’s going on in the AI stock market here.

  — — — — — —

BENCHMARK TESTS

OpenAI models criticized with own benchmark?

Our Report: OpenAI has created a “simple but challenging” open-source benchmark test—SimpleQA—to measure the factual accuracy of Large Language Models (LLMs)— helping developers build more reliable AI—but have found their own models have fallen short.

🔑 Key Points:

  • SimpleQA contains over 4,000 questions (across topics like science, politics, and art), each has just one right answer, and the AI model’s answers are checked against SimpleQA’s database, with ChatGPT scoring the answers.

  • OpenAI’s best model (GPT-o) scored just 42.7%, with their smallest model (GPT-4o mini) scoring just 8.6% which, according to OpenAI, is because “the model is small and has little knowledge about the world.

  • The test also asks AI models to judge how accurate they believe their answers are, and testers found that most AI models overestimate their capabilities, giving inflated scores about their answer accuracy.

🤔 Why you should care: These findings are concerning as many people are increasingly relying on AI models like ChatGPT and Claude (Anthropic’s best-performing model, Claude-3.5 Sonnet scored just 28.9%) for research and learning, believing they’re giving factually accurate answers, but because the benchmark just measures the knowledge these models have been trained on (not their ability to provide correct answers when given additional context or internet access) it’s clear they aren't quite reliable enough for independent fact-finding or verification, and can’t perhaps be used as stand-alone sources of knowledge.

   — — — — —

TOGETHER WITH INNOVATING WITH AI

Innovating with AI

The AI consulting market is about to grow by 8X – from $6.9B to $54.7B in 2032.

But how does an AI enthusiast become an AI consultant?

How well you answer that question makes the difference between just “having AI ideas” and being handsomely compensated for your contribution to an organization’s AI transformation.

Thankfully, you don’t have to go it alone – our friends at Innovating with AI have welcomed 300 new students into The AI Consultancy Project, their new program that trains you to build a business as an AI consultant.

Some of the highlights current students are excited about:

  • The tools and frameworks to find clients and deliver top-notch services

  • A 6-month plan to build a 6-figure AI consulting business

  • Students getting their first AI client in as little as 3 days

And as a reader of AI Tool Report, you get early access to the next enrollment cycle.

   — — — —

GOVERNMENT

Meta partners with US Government

Our Report: During Meta's Q3 earnings meeting, CEO Mark Zuckerberg announced that the company would be working with several US government agencies to investigate how it could integrate its Large Language Model (LLM)—Llama—into public sector applications, to support government initiatives and address societal challenges.

🔑 Key Points:

  • Meta is already working with the US State Department to see how Llama could help address challenges like “expanding access to clean water and reliable electricity in developing states and supporting small businesses.”

  • It’s also been talking with the Department of Education to investigate how Llama could simplify the complex funding process for students and is having ongoing discussions with several other departments.

  • Meta also confirmed these partnerships aren’t financially driven, instead of payment they want to position Llama as an invaluable tool for government use, making it the go-to platform for federal initiatives.

🤔 Why you should care: This comes as several of Meta’s competitors have taken steps to build closer ties with government bodies and work together to bring advanced AI capabilities into the public sector— ultimately strengthening their position as a key leader in the AI space—for example, Google has previously worked with the Pentagon, using AI to identify aire strike targets, and OpenAI and Anthropic have both recently committed to sharing their AI models with the US AI Safety Institute, ahead of release, for security screening to help with security initiatives.

    — — —

TOGETHER WITH CHATNODE

Still drowning in customer questions?

With Chatnode, it's now even easier to build advanced AI chatbots that deeply understand your business, handle inquiries 24/7, and drive more sales.

Here’s why Chatnode is different:

🔍 Reliable Responses: RAG technology ensures consistent and accurate answers.

Easy Setup: Create and launch your chatbot quickly, no coding required.

🔒 Enterprise-Grade Security: Top-tier security and compliance for your data.

🔌 Connects with Your Software: Zendesk, Slack, Google Drive, Notion, Zapier, SharePoint, Dropbox, Onedrive, Make, and more.

💬 Live Agent Handoff: Easily transfers chatbot conversations to human agents.

🤖 Model Agnostic AI: Choose your LLM: Claude, Gemini, ChatGPT, and Perplexity.

PROMPT ENGINEERING

     — —

Thursday’s Prompt: How to give a motivational speech using ChatGPT

Type this prompt into ChatGPT:

I want you to be a motivational speaker. Put together words that inspire and make people feel empowered to do something beyond their abilities. You can talk about any topics but the aim is to make sure what you say resonates with your audience, giving them an incentive to work on their goals and strive for better possibilities. My first request is "I need a speech about how noone should ever give up."

Health

Results: After typing this prompt, you will get a motivational speech that empowers people to push themselves beyond their capabilities.

P.S. Use the Prompt Engineer GPT by AI Tool report to 10x your prompts.

ACTIONABLE INSIGHTS

It’s not too late! Join the AI Reports AI Skill Sprint on Skool and master 6 crucial AI skills in just 6 weeks…

AI skills sprint

What You'll Learn: Jake George, founder of Synthoria Labs—which specializes in combining AI and automation to replace manual work with efficient, automated workflows—will take you through the various AI agents you can use to automate day-to-day tasks like managing emails, finding leads, analyzing content, and logging invoices to boost productivity and ROI.

🫱🏻‍🫲🏻 Connect with Jake here.

BREAKING NEWS

   

AUTONOMOUS DRIVING

  • Google-owned autonomous driving technology company—Waymo—is planning to build a new AI model to train its self-driving “robotaxis” that will leverage Google’s multimodal LLM, Gemini.

  • It’s released a research paper, titled “End-to-End Multimodal Model for Autonomous Driving” (EMMA) which references using its new training model to help driverless vehicles make smarter decisions.

  • Waymo believes that Gemini will bring “chain-of-thought” reasoning to its driverless vehicles to address challenges—like finding the right route or overcoming road obstacles—by breaking tasks into steps, as humans do.

🕊️

📖

🕊️

We read your emails, comments, and poll replies daily.

Hit reply and tell us what you want more of!

Got a friend who needs to learn more about AI? Sign them up to the AI Tool Report, here.

Until next time, Martin & Liam.

P.S. Don’t forget, you can unsubscribe if you don’t want us to land in your inbox anymore.

What did you think of this edition?

Login or Subscribe to participate in polls.