Publikace detail

Evaluation of Search Engines and AI Chatbots Using A/B Testing

Autoři: Škopljanac-Mačina D. | Mesic Tomislav

Rok: 2025

Druh publikace: článek ve sborníku

Název zdroje: 2025 MIPRO 48th ICT and Electronics Convention : proceedings

Název nakladatele: Croatian Society for Information, Communication and Electronic Technology - MIPRO

Místo vydání: Rijeka

Strana od-do: 373-378

Tituly:

Jazyk	Název	Abstrakt	Klíčová slova
cze	Vyhodnocení vyhledávačů a chatbotů s umělou inteligencí pomocí A/B testování	Chatboti s umělou inteligencí se dnes často používají k učení a vzdělávání, aniž by se zohledňovala kvalita odpovědí, které poskytují. V tomto článku jsme vyhodnotili kvalitu odpovědí na uživatelské dotazy z populárních vyhledávačů a chatbotů s umělou inteligencí. Naším cílem bylo objektivně určit pomocí A/B testování, zda je chatbot ChatGPT lepší v odpovídání na různé uživatelské dotazy než Vyhledávání Google. Pro vyhodnocení populárních vyhledávačů a chatbotů s umělou inteligencí vytvořených pomocí node.js a různých API jsme použili naše vlastní webové rozhraní. V našem experimentu jsme navrhli sadu testovacích dotazů ve formě faktických datových otázek, matematických problémů a logických hádanek. Vyhodnotili jsme odpovědi z různých modelů ChatGPT a vyhledávače Google na všechny testovací dotazy. Následně se prostřednictvím našeho webového rozhraní automaticky spustí metoda A/B testování, abychom zjistili, zda existuje statisticky významný rozdíl mezi kvalitou jejich odpovědí. Došli jsme k závěru, že pro naši testovací sadu dotazů není statisticky významný rozdíl mezi starším modelem ChatGPT 3.5 a vyhledávačem Google. Zjistili jsme však, že model ChatGPT 4 je lepší v odpovídání na naše testovací dotazy než Vyhledávání Google a rozdíl je statisticky významný.
eng	Evaluation of Search Engines and AI Chatbots Using A/B Testing	AI chatbots are today often used for learning and education without considering the quality of answers they provide. In this paper we evaluated the quality of responses to user queries from popular search engines and AI chatbots. We aimed to objectively determine using the A/B testing if the ChatGPT chatbot is better in answering various user queries than the Google Search. We used our own web interface for evaluating popular search engines and AI chatbots created using node.js and different APIs. In our experiment we devised a set of test queries in the form of factual data questions, mathematical problems and logic-based riddles. We rated the responses from various ChatGPT models and Google Search engine to all the test queries. Afterwards, the A/B testing method is run automatically through our web interface to find out if there is a statistically significant difference between the quality of their answers. We concluded that for our test query set there is no statistically significant difference between the earlier ChatGPT model 3.5 and the Google Search engine. However, we found that the ChatGPT model 4 is better in answering our test queries than the Google Search, and the difference is statistically significant.	AI chatbots; search engines; ChatGPT, Google Search; A/B testing

Vyhledávání

Přihlášení pro studenty

Přihlášení pro zaměstnance

Publikace detail

Ústavy

Pracoviště

Jak nás najdete?

Služby

Často hledáte