Search Results - "LLM evaluation" :: K.UTB vyhledávací portál

Loading…

Enhancing Fine-Tuning LLM Evaluation: A Study on Calibration and Metrics for Industry-Specific AI Alignment

by Stavarache, Lucia Larise
Published in 2025 IEEE Conference on Artificial Intelligence (CAI) (05.05.2025)

Get full text

Conference Proceeding

Loading…

ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding

by Azime, Israel Abebe, Tonja, Atnafu Lambebo, Belay, Tadesse Destaw, Chanie, Yonas, Balcha, Bontu Fufa, Abadi, Negasi Haile, Ademtew, Henok Biadglign, Nerea, Mulubrhan Abebe, Yadeta, Debela Desalegn, Geremew, Derartu Dagne, tesfau, Assefa Atsbiha, Slusallek, Philipp, Solorio, Thamar, Klakow, Dietrich
Year of Publication 07.11.2024

Get full text

Journal Article

Loading…

Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia

by Koto, Fajri
Year of Publication 13.09.2024

Get full text

Journal Article

Loading…

ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding

by Israel Abebe Azime, Tonja, Atnafu Lambebo, Tadesse Destaw Belay, Yonas Chanie, Bontu Fufa Balcha, Negasi Haile Abadi, Ademtew, Henok Biadglign, Mulubrhan, Abebe Nerea, Yadeta, Debela Desalegn, Derartu Dagne Geremew, Assefa Atsbiha tesfau, Slusallek, Philipp, Solorio, Thamar, Klakow, Dietrich
Published in arXiv.org (16.11.2024)

Get full text

Paper

Loading…

Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia

by Koto, Fajri
Published in arXiv.org (13.09.2024)

Get full text

Paper

Loading…

Evaluating Large Language Model Robustness using Combinatorial Testing

by Chandrasekaran, Jaganmohan, Patel, Ankita Ramjibhai, Lanus, Erin, Freeman, Laura J.
Published in 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) (31.03.2025)

Get full text

Conference Proceeding

Loading…

On the attribution of confidence to large language models

by Keeling, Geoff, Street, Winnie
Published in Inquiry (Oslo) (14.01.2025)

Get full text

Journal Article

Loading…

HardML: A Benchmark for Evaluating Data Science and Machine Learning Knowledge and Reasoning in AI

by Tidor-Vlad PRICOPE
Published in Studia Universitatis Babes-Bolyai: Series Informatica (02.04.2025)

Get full text

Journal Article

Loading…

Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning

by He, Linyang, Nie, Ercong, Schmid, Helmut, Schütze, Hinrich, Mesgarani, Nima, Brennan, Jonathan
Published in arXiv.org (12.11.2024)

Get full text

Paper

Loading…

On the attribution of confidence to large language models

by Keeling, Geoff, Street, Winnie
Year of Publication 11.07.2024

Get full text

Journal Article

Loading…

Pragmatics beyond humans: meaning, communication, and LLMs

by Gvoždiak, Vít
Year of Publication 08.08.2025

Get full text

Journal Article

Loading…

Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models

by Ibrahim, Lujain, Akbulut, Canfer, Elasmar, Rasmi, Rastogi, Charvi, Kahng, Minsuk, Morris, Meredith Ringel, McKee, Kevin R, Rieser, Verena, Shanahan, Murray, Weidinger, Laura
Year of Publication 10.02.2025

Get full text

Journal Article

Loading…

Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition

by Fiori, Michele, Civitarese, Gabriele, Bettini, Claudio
Year of Publication 24.07.2024

Get full text

Journal Article

Loading…

On the attribution of confidence to large language models

by Keeling, Geoff, Street, Winnie
Published in arXiv.org (11.07.2024)

Get full text

Paper

Loading…

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations

by Pal, Ankit, Sankarasubbu, Malaikannan
Year of Publication 10.02.2024

Get full text

Journal Article

Loading…

Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?

by Ghahroodi, Omid, Nouri, Marzia, Sanian, Mohammad Vali, Sahebi, Alireza, Dastgheib, Doratossadat, Asgari, Ehsaneddin, Baghshah, Mahdieh Soleymani, Rohban, Mohammad Hossein
Year of Publication 09.04.2024

Get full text

Journal Article

Loading…

LLM-based graduation design thesis intelligent evaluation system

by ZHANG MIN, WU FAN, WANG XINLU, LEI FEI
Year of Publication 04.03.2025

Get full text

Patent

Loading…

Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition

by Fiori, Michele, Civitarese, Gabriele, Bettini, Claudio
Published in arXiv.org (24.07.2024)

Get full text

Paper

Loading…

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations

by Pal, Ankit, Sankarasubbu, Malaikannan
Published in arXiv.org (10.02.2024)

Get full text

Paper

Loading…

Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language?

by Ghahroodi, Omid, Nouri, Marzia, Mohammad Vali Sanian, Sahebi, Alireza, Dastgheib, Doratossadat, Asgari, Ehsaneddin, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban
Published in arXiv.org (09.04.2024)

Get full text

Paper

Refine Results

Format

Subject Area

Topic

Language

Year of Publication

Database