Hey there! You've stumbled upon our project where we're diving deep into the nuts and bolts of evaluating Generative AI applications, focusing on both Large and Smaller Language Models. This repo is our shared notebook, a place where we document our experiments, findings, and the technical challenges we tackle along the way. Using PromptFlow as our foundation, we're piecing together a framework that's all about getting hands-on and making sense of how to best evaluate and benchmark these complex AI systems. Join us in this technical exploration.
Evaluating LLMs and SLMs presents unique challenges, including the need for continuous evaluation, adherence to responsible AI practices, and the tailoring of evaluation metrics to specific applications. Prompt Flow addresses these challenges by offering:
- Continuous Integration, Evaluation, and Deployment (CI/CE/CD): Implementing LLMOps for effective lifecycle management.
- Responsible AI Practices: Ensuring ethical use and mitigating potential risks.
- Tailored Evaluation Metrics: Customizing metrics for meaningful assessments.
Our choice to integrate PromptFlow into our workflow was driven by its ability to cater to our specific evaluation needs. Here's a closer look at why it's our toolkit of choice:
-
Tailored Workflows: PromptFlow's flexibility shines in its ability to let us craft evaluation workflows that are just right for our models. Whether it's offline analysis or real-time testing, we've got the tools we need to put our AI through its paces.
-
Comprehensive Testing: The framework supports both offline and online evaluation strategies. This dual approach allows us to thoroughly vet our models in both controlled settings and live environments, ensuring they're up to any challenge.
-
Deep Dive Insights: With PromptFlow's advanced tracing and observability, we're never in the dark about how our models are performing. Tracking every input and output gives us a granular view of our AI's behavior, making it easier to tweak, tune, and improve.
PromptFlow enables developers to define and manage evaluation workflows, automate prompt testing, and analyze outputs effectively. Follow our guides to implement your evaluation strategies:
- Define Evaluation Workflows: Utilize Prompt Flow to set up comprehensive evaluation workflows.
- Automate Prompt Testing: Leverage the framework to automate the testing of prompts and analyze outputs.
- Analyze and Optimize: Use the insights gained from evaluations to debug, optimize, and improve your GenAI applications.
With PromptFlow, developers gain enhanced tracing and observability features, allowing for detailed monitoring of GenAI applications from input to output. This includes:
- Flexibility in Tracing: Support for various endpoints, including Azure AI Foundry and Azure Application Insights.
- Streamlined Deployment: Deploy optimized GenAI applications to Azure AI Foundry for secure and scalable development.
- Flex Flow: Incorporate your applications into Prompt Flow for comprehensive evaluation and debugging.
PromptFlow's integration with Azure AI Foundry offers centralized test history, enhanced test analysis, and asset reutilization, facilitating:
- Centralized Test History: Store and track all historical tests for easy accessibility.
- Enhanced Analysis: Extract and visualize test results for comprehensive comparisons.
- Asset Reutilization: Streamline workflows by reusing previous test assets for efficiency.
Before you begin, ensure you have the following:
- Access to Azure AI Foundry
Let's get your development environment set up:
Before running this notebook, you must configure certain environment variables to securely store our configuration. This practice helps in preventing sensitive data from being accidentally committed to version control systems.
Create a .env
file in your project root (use the provided .env.sample
as a template) and add the following variables:
# Azure Open AI Completion Configuration
AZURE_AOAI_API_KEY=""
AZURE_AOAI_COMPLETION_MODEL_DEPLOYMENT_ID=""
AZURE_AOAI_ENDPOINT=""
AZURE_AOAI_DEPLOYMENT_VERSION=""
AZURE_AI_STUDIO_SUBSCRIPTION_ID=""
AZURE_AI_STUDIO_RESOURCE_GROUP_NAME=""
AZURE_AI_STUDIO_PROJECT_NAME=""
Please replace the placeholders with your actual Azure OpenAI and Azure AI Foundry configuration details:
AZURE_AOAI_API_KEY
: Your Azure OpenAI API key. You can obtain this from the Azure OpenAI service.AZURE_AOAI_COMPLETION_MODEL_DEPLOYMENT_ID
: The deployment ID for your Azure OpenAI model.AZURE_AOAI_ENDPOINT
: The endpoint URL for your Azure OpenAI service.AZURE_AOAI_DEPLOYMENT_VERSION
: The version of your Azure OpenAI deployment.AZURE_AI_STUDIO_SUBSCRIPTION_ID
: Your Azure subscription ID where the AI Studio project is hosted.AZURE_AI_STUDIO_RESOURCE_GROUP_NAME
: The name of the resource group for your AI Studio project.AZURE_AI_STUDIO_PROJECT_NAME
: The name of your AI Studio project.
To gather your Azure OpenAI API keys, visit the Azure OpenAI service documentation. For the keys related to your project in Azure AI Foundry, you can find them in your project's settings within the Azure portal.
📌 Note Remember not to commit the .env file to your version control system. Add it to your .gitignore file to prevent it from being tracked.
Follow these steps to create a Conda environment and set up your VSCode for running Jupyter Notebooks:
Instructions for Windows users:
-
Create the Conda Environment:
- In your terminal or command line, navigate to the repository directory.
- Execute the following command to create the Conda environment using the
environment.yaml
file:conda env create -f environment.yaml
- This command creates a Conda environment as defined in
environment.yaml
.
-
Activating the Environment:
- After creation, activate the new Conda environment by using:
conda activate promptflow-eval-framework
- After creation, activate the new Conda environment by using:
Instructions for Linux users (or Windows users with WSL or other linux setup):
-
Use
make
to Create the Conda Environment:- In your terminal or command line, navigate to the repository directory and look at the Makefile.
- Execute the
make
command specified below to create the Conda environment using theenvironment.yaml
file:make create_conda_env
-
Activating the Environment:
- After creation, activate the new Conda environment by using:
conda activate promptflow-eval-framework
- After creation, activate the new Conda environment by using:
-
Install Required Extensions:
- Download and install the
Python
andJupyter
extensions for VSCode. These extensions provide support for running and editing Jupyter Notebooks within VSCode.
- Download and install the
-
Open the Notebook:
- Open the Jupyter Notebook file (
01-promptflow-evaluation-howto.ipynb
) in VSCode.
- Open the Jupyter Notebook file (
-
Attach Kernel to VSCode:
- After creating the Conda environment, it should be available in the kernel selection dropdown. This dropdown is located in the top-right corner of the VSCode interface.
- Select your newly created environment (
promptflow-eval-framework
) from the dropdown. This sets it as the kernel for running your Jupyter Notebooks.
-
Run the Notebook:
- Once the kernel is attached, you can run the notebook by clicking on the "Run All" button in the top menu, or by running each cell individually.
By following these steps, you'll establish a dedicated Conda environment for your project and configure VSCode to run Jupyter Notebooks efficiently. This environment will include all the necessary dependencies specified in your environment.yaml
file. If you wish to add more packages or change versions, please use pip install
in a notebook cell or in the terminal after activating the environment, and then restart the kernel. The changes should be automatically applied after the session restarts.
- Prompt Flow Documentation: For detailed information on Prompt Flow and its components, visit our Documentation.
- Tutorials: Check out our Tutorials for hands-on guides on setting up and utilizing Prompt Flow for LLM/SLM evaluation.
Important
This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.