Since the last 2+ years, there has been an unprecedented boom in application of artificial intelligence and its use in the field of testing has also increased exponentially.
However, I don’t see a lot of people who actually talk or write about how to test AI or the models ( also called Large Language Models – LLMs). In this page, I’ll try to list down articles/resources/videos/blog posts that I can find in this regards. So if you’re interested in this field, read down on this list.
- Trustworthy Retrieval-Augmented Generation with the Trustworthy Language Model
- RAG vs Finetuning LLMs – What to use, when, and why.
- Regression Testing with LangSmith
- Regression Testing with LangSmith – Video
- Comprehensive Guide to reducing LLM Hallucinations
- Possibilities of use of AI in Testing
- RAG for Quality Engineers
- Mitigating Hallucinations in LLMs by Enhancing Pre-training Data Quality
- AI model for Browser Automation
- https://github.com/dipjyotimetia/jarvis
- Open source RAG with LLama2 and LangChain
- Frameworks in Focus: ‘Building and Evaluating Advanced RAG’ with TruLens and LlamaIndex Insights
- Weights and Biases in Machine Learning
- Three phase LLM learning
- Transformer Model
- Supervised vs Unsupervised Learning
- Machine Learning vs Deep Learning
- PDF retrival using Llamaparse
- Foundational Models
- What are LLMs
- What is an LLM – Cloudflare
- LLM – Techopedia
- What are LLMs – AWS Amazon
- Large Language Model Training in 2024 – AIMultiple
- What is LLM – Elastic
- Masked Language Modelling
- Building Product Knowledge Graph in DoorDash
- RAG vs Large Context Models
- What is a RAG
- Building and evaluating a RAG with TruLens
- RAG with BigQuery and LangChain
- Diagnostic Tool for Deep Neural Networks
- Evaluating Large Language Models: A Complete Guide
- Evidently AI – Evaluation of LLM
- LLM Evaluation Guide
- Best LLM
- Essential Idea of a Neural Network
- Advancements in RAG Technologies and Techniques
- GraphRAG
- LLM Observability with Grafana
- Master RAG in 5 hours
- Using Evaluations to Optimize a RAG Pipeline: from Chunkings and Embeddings to LLMs
- What We Still Don’t Understand About Machine Learning
- The Tech Buffet #15: Build and Evaluate LLM Applications with TruLens
- Using Giskard – an e2e evaluation framework for LLMs
- Introduction to Giskard: Open-Source Quality Management for AI Models
- Giskard : The testing framework for ML models, from tabular to LLMs.