# Get Token get /v1/token # Create Workflows Run post /v1/evaluate/runs Create a new Evaluate run with workflows. Use this endpoint to create a new Evaluate run with workflows. The request body should contain the `workflows` to be ingested and evaluated. Additionally, specify the `project_id` or `project_name` to which the workflows should be ingested. If the project does not exist, it will be created. If the project exists, the workflows will be logged to it. If both `project_id` and `project_name` are provided, `project_id` will take precedence. The `run_name` is optional and will be auto-generated (timestamp-based) if not provided. The body is also expected to include the configuration for the scorers to be used in the evaluation. This configuration will be used to evaluate the workflows and generate the results. # API Reference | Getting Started with Galileo Get started with Galileo's REST API: learn about base URLs, authentication methods, and how to verify your API setup for seamless integration. Galileo provides a public REST API that you can use to interact with the Galileo platform. This API allows you to perform various operations across Evaluate, Observe and Protect. This guide will help you get started with the Galileo REST API. ## Base API URL The first thing you need to talk to the Galileo API is the base URL of your Galileo API instance. If you know the URL that you use to access the Galileo console, you can replace `console` in it with `api`. For example, if your Galileo console URL is `https://console.galileo.myenterprise.com`, then your base URL for the API is `https://api.galileo.myenterprise.com`. ### Verify the Base URL To verify the base URL of your Galileo API instance, you can send a `GET` request to the [`healthcheck` endpoint](/api-reference/health/healthcheck). ```bash curl -X GET https://api.galileo.myenterprise.com/v1/healthcheck ``` ## Authentication For interacting with our public endpoints, you can use any of the following methods to authenticate your requests: ### API Key To use your [API key](/galileo/gen-ai-studio-products/galileo-evaluate/quickstart#getting-an-api-key) to authenticate your requests, include the key in the HTTP headers for your requests. ```json { "Galileo-API-Key": "" } ``` ### HTTP Basic Auth To use HTTP Basic Auth to authenticate your requests, include your username and password in the HTTP headers for your requests. ```json { "Authorization": "Basic :)>" } ``` ### JWT Token To use a JWT token to authenticate your requests, include the token in the HTTP headers for your requests. ```json { "Authorization": "Bearer " } ``` We recommend using this method for high-volume requests because it is more secure (expires after 24 hours) and scalable than using an API key. To generate a JWT token, send a `GET` request to the [`get-token` endpoint](/api-reference/auth/get-token) using the API Key or HTTP Basic auth. # Healthcheck get /v1/healthcheck # Get Observe Workflows post /v1/observe/projects/{project_id}/workflows Get workflows for a specific run in an Observe project. # Log Workflows post /v1/observe/workflows Log workflows to an Observe project. Use this endpoint to log workflows to an Observe project. The request body should contain the `workflows` to be ingested. Additionally, specify the `project_id` or `project_name` to which the workflows should be ingested. If the project does not exist, it will be created. If the project exists, the workflows will be logged to it. If both `project_id` and `project_name` are provided, `project_id` will take precedence. # Invoke post /v1/protect/invoke # WorkflowStep # Python Client Reference | Galileo Evaluate Integrate Galileo's Evaluate module into your Python applications with this guide, featuring installation steps and examples for prompt quality assessment. For a full reference of promptquality check out: [https://promptquality.docs.rungalileo.io/](https://promptquality.docs.rungalileo.io/) ## Installation `pip install promptquality` ## Evaluate ```py import promptquality as pq pq.login({YOUR_GALILEO_URL}) template = "Explain {topic} to me like I'm a 5 year old" data = {"topic": ["Quantum Physics", "Politics", "Large Language Models"]} pq.run(project_name='my_first_project', template=template, dataset=data, settings=pq.Settings(model_alias='ChatGPT (16K context)', temperature=0.8, max_tokens=400)) ``` # TypeScript Client Reference | Galileo Evaluate Incorporate Galileo's Evaluate module into your TypeScript projects with this guide, providing setup instructions and workflow logging examples. For a full reference check out: [https://www.npmjs.com/package/@rungalileo/galileo](https://www.npmjs.com/package/@rungalileo/galileo) ## Installation `npm install @rungalileo/galileo` Set environment variables in `.env` file. ``` GALILEO_CONSOLE_URL="https://console.galileo.yourcompany.com" GALILEO_API_KEY="Your API Key" # Alternatively, you can also use username/password. GALILEO_USERNAME="Your Username" GALILEO_PASSWORD="Your Password" ``` ## Log Workflows ```TypeScript import { GalileoEvaluateWorkflow } from "@rungalileo/galileo"; // Initialize and create project const evaluateWorkflow = new GalileoEvaluateWorkflow("Evaluate Workflow Example"); await evaluateWorkflow.init(); // Evaluation dataset const evaluateSet = [ "What are hallucinations?", "What are intrinsic hallucinations?", "What are extrinsic hallucinations?" ] // Add workflows const myLlmApp = (input) => { const template = "Given the following context answer the question. \n Context: {context} \n Question: {question}" // Add workflow evaluateWorkflow.addWorkflow({ input }); // Get context from Retriever // Pseudo-code, replace with your Retriever call const retrieverCall = () => 'You're an AI assistant helping a user with hallucinations.'; const context = retrieverCall() // Log Retriever Step evaluateWorkflow.addRetrieverStep({ input: template, output: context }) // Get response from your LLM // Pseudo-code, replace with your LLM call const prompt = template.replace('{context}', context).replace('{question}', input) const llmCall = (_prompt) => 'An LLM response…'; const llmResponse = llmCall(prompt); // Log LLM step evaluateWorkflow.addLlmStep({ durationNs: parseInt((Math.random() * 3) * 1000000000), input: prompt, output: llmResponse, }) // Conclude workflow evaluateWorkflow.concludeWorkflow(llmResponse); } evaluateSet.forEach((input) => myLlmApp(input)); // Configure run and upload workflows to Galileo // Optional: Set run name, tags, registered scorers, and customized scorers // Note: If no run name is provided a timestamp will be used await evaluateWorkflow.uploadWorkflows( { adherence_nli: true, chunk_attribution_utilization_nli: true, completeness_nli: true, context_relevance: true, factuality: true, instruction_adherence: true, ground_truth_adherence: true, pii: true, prompt_injection: true, prompt_perplexity: true, sexist: true, tone: true, toxicity: true, } ); ``` # Data Quality | Fine-Tune NLP Studio Client Reference Enhance your data quality in Galileo's NLP and CV Studio using the 'dataquality' Python package; find installation and usage details here. For a full reference check out: [https://dataquality.docs.rungalileo.io/](https://dataquality.docs.rungalileo.io/) Installation: `pip install dataquality` # Python Client Reference | Galileo Observe Integrate Galileo's Observe module into your Python applications; access installation instructions and comprehensive documentation for workflow monitoring. For a full reference check out: [https://observe.docs.rungalileo.io/](https://observe.docs.rungalileo.io/) ## Installation `pip install galileo-observe` # TypeTypeScript Client Reference | Galileo Observescript Integrate Galileo's Observe module into TypeScript applications with setup guides, sample code, and monitoring instructions for seamless workflow tracking. For a full reference check out: [https://www.npmjs.com/package/@rungalileo/galileo](https://www.npmjs.com/package/@rungalileo/galileo) ## Installation `npm install @rungalileo/galileo` Set environment variables in `.env` file. ``` GALILEO_CONSOLE_URL="https://console.galileo.yourcompany.com" GALILEO_API_KEY="Your API Key" # Alternatively, you can also use username/password. GALILEO_USERNAME="Your Username" GALILEO_PASSWORD="Your Password" ``` ## Log Workflows ```TypeScript import { GalileoObserveWorkflow } from "@rungalileo/galileo"; // Initialize and create project const observeWorkflow = new GalileoObserveWorkflow("Observe Workflow Example"); await observeWorkflow.init(); // Evaluation dataset const observeSet = [ "What are hallucinations?", "What are intrinsic hallucinations?", "What are extrinsic hallucinations?" ] // Add workflows const myLlmApp = (input) => { const template = "Given the following context answer the question. \n Context: {context} \n Question: {question}" // Add workflow observeWorkflow.addWorkflow({ input }); // Get context from Retriever // Pseudo-code, replace with your Retriever call const retrieverCall = () => 'You're an AI assistant helping a user with hallucinations.'; const context = retrieverCall() // Log Retriever Step observeWorkflow.addRetrieverStep({ input: template, output: context }) // Get response from your LLM // Pseudo-code, replace with your LLM call const prompt = template.replace('{context}', context).replace('{question}', input) const llmCall = (_prompt) => 'An LLM response…'; const llmResponse = llmCall(prompt); // Log LLM step observeWorkflow.addLlmStep({ durationNs: parseInt((Math.random() * 3) * 1000000000), input: prompt, output: llmResponse, }) // Conclude workflow observeWorkflow.concludeWorkflow(llmResponse); } observeSet.forEach((input) => myLlmApp(input)); // Upload workflows to Galileo await observeWorkflow.uploadWorkflows(); ``` # Client References Explore Galileo's client references, including Python and TypeScript integrations, to streamline Evaluate, Observe, and Protect module implementations. Tutorials and full Client References for Galileo's modules. ## Evaluate ## Observe ## Protect ## Finetune and NLP Studio # Python Client Reference | Galileo Protect Integrate Galileo's Protect module into Python workflows with this guide, including code examples, setup instructions, and ruleset invocation details. For a full reference check out: [https://protect.docs.rungalileo.io/](https://protect.docs.rungalileo.io/) ### Step 1: Install galileo-protect `pip install galileo-protect` ### Step 2: Set your Console URL and API Key, create a project and stage. Example: ```py import galileo_protect as gp import os os.environ['GALILEO_API_KEY']="Your Galileo API key" os.environ['GALILEO_CONSOLE_URL']="Your Galileo Console Url" project = gp.create_project('my first protect project') project_id = project.id stage = gp.create_stage(name="my first stage", project_id=project_id) stage_id = stage.id ``` ### Step 3: Integrate Galileo Protect with your app Galileo Protect can be embedded in your production application through `gp.invoke()` like below: ```py USER_QUERY = 'What\'s my SSN? Hint: my SSN is 123-45-6789' MODEL_RESPONSE = 'Your SSN is 123-45-6789' response = gp.invoke( payload={"input":USER_QUERY, "output":MODEL_RESPONSE}, prioritized_rulesets=[ { "rules": [ { "metric": "pii", "operator": "contains", "target_value": "ssn", }, ], "action": { "type": "OVERRIDE", "choices": [ "Personal Identifiable Information detected in the model output. Sorry, I cannot answer that question." ], }, }, ], stage_id=stage_id, timeout=10, # number of seconds for timeout ) ``` # Data Privacy And Compliance This page covers concerns regarding residency of data and compliances provided by Galileo. ## Security Standards Clusters hosted by Galileo are hosted in Amazon Web Services, ensuring the highest degree of physical security and environmental control. All intermediate environments which transfer or store data are reviewed to meet rigid security standards. ## Incident Response, Disaster Recovery & Business Continuity Galileo has a well-defined incident response and disaster recovery policy. In the unlikely event of an incident, Galileo will: * Assemble response team members, including two assigned on-call engineers available at all times of day * Immediately revoke relevant access or passwords * Notify Galileo's Engineering and Customer Success Teams * Notify customers impacted of the intrusion and if/how their data was compromised * Provide a resolution timeline * Conduct an audit of systems to ascertain the source of the breach * Refine existing practices to prevent future impact and harden systems * Communicate the improvement plan to customers impacted ## Compliance Galileo provides on-going training for employees for all information security practices and policies, and maintains measures to address violations of procedures. As part of onboarding and off-boarding team members, access controls are managed to ensure those in role are only given access to what the role requires. Galileo is SOC 2 Type 1 and Type 2 compliant, and therefore we adhere to the requirements of this compliance throughout the year. These include independent audit. # Dependencies Understand Galileo deployment prerequisites and dependencies to ensure a smooth installation and integration across supported platforms. ### Core Dependencies * Kubernetes Cluster: Galileo is deployed within a Kubernetes environment, leveraging various Kubernetes resources. ### Data Stores * PostgreSQL: Used for persistent data storage (if not using AWS RDS or GCP CloudSQL). * ClickHouse: A columnar database used for storing and querying large volumes of data efficiently. It supports analytics and real-time reporting. * MinIO: Serves as the object storage solution (if not using AWS S3 or GCP Cloud Storage). ### Messaging * RabbitMQ: Acts as the message broker for asynchronous communication. ### Monitoring and Logging * Prometheus: For metrics collection and monitoring. This will also send metrics to Galileo's centralized Grafana server for observability. * Prometheus Adapter: This component is crucial for enabling Kubernetes Horizontal Pod Autoscaler (HPA) to use Prometheus metrics for scaling applications. It must be activated through the `.Values.prometheus_adapter.enabled` Helm configuration. Care should be taken to avoid conflicts with existing services, such as the metrics-server, potentially requiring resource renaming for seamless integration. * Grafana: For visualizing metrics. Optional, as users might not require metric visualization. * Fluentd: For logging and forwarding to AWS CloudWatch. Optional, depending on the logging and log forwarding requirements. * Alertmanager: Manages alerts for the monitoring system. Optional, if no alerting is needed or a different alerting mechanism is in place. Ensure that the corresponding Helm values (`prometheus_adapter.enabled`, `fluentd.enabled`, `alertmanager.enabled`) are configured according to your deployment needs. ### Networking * Ingress NGINX: Manages external access to the services. * Calico: Provides network policies. * Cert-Manager: Handles certificate management. ### Configuration and Management * Helm: Galileo leverages Helm for package management and deployment. Ensure Helm is configured correctly to deploy the charts listed above. ### Miscellaneous * Cluster Autoscaler: Automatically adjusts the size of the Kubernetes cluster. * Kube-State-Metrics: Generates metrics about the state of Kubernetes objects. * Metrics Server: Aggregates resource usage data. * Node Exporter: Collects metrics from the nodes. * ClickHouse Keeper: Acts as the service for managing ClickHouse replicas and coordinating distributed tasks, similar to Zookeeper. Essential for ClickHouse high availability and consistency. # Azure AKS This page details the steps to deploy a Galileo Kubernetes cluster in Microsoft Azure's AKS service environment. \*\* Total time for deployment:\*\* 30-45 minutes ## Recommended Cluster Configuration | Configuration | Recommended Value | | ------------------------------------------------------ | --------------------------- | | **Nodes in the cluster’s core nodegroup** | 4 (min) 5 (max) 4 (desired) | | **CPU per core node** | 4 CPU | | **RAM per core node** | 16 GiB RAM | | **Number of nodes in the cluster’s runners nodegroup** | 1 (min) 5 (max) 1 (desired) | | **CPU per runner node** | 8 CPU | | **RAM per runner node** | 32 GiB RAM | | **Minimum volume size per node** | 200 GiB | | **Required Kubernetes API version** | 1.21 | | **Storage class** | standard | ## Step 1: \[Optional] Create a dedicated resource group for Galileo cluster ```sh az group create --name galileo --location eastus ``` ## Step 2: Provision an AKS cluster ```sh az aks create -g galileo -n galileo --enable-managed-identity --node-count 4 --max-count 7 --min-count 4 -s Standard_D4_v4 --nodepool-name gcore --nodepool-labels "galileo-node-type=galileo-core" --enable-cluster-autoscaler ``` ## Step 3: Add Galileo Runner nodepool ```sh Az aks nodepool add -g galileo -n grunner --cluster-name galileo --node-count 1 --max-count 5 --min-count 1 --node-count 1 -s Standard_D8_v4 --labels "galileo-node-type=galileo-runner" --enable-cluster-autoscaler ``` ## Step 4: Get cluster credentials ```sh az aks get-credentials --resource-group galileo --name galileo ``` ## Step 5: Apply Galileo manifest ```sh kubectl apply -f galileo.yaml ``` ## Step 6: Customer DNS Configuration Galileo has 4 main URLs (shown below). In order to make the URLs accessible across the company, you have to set the following DNS addresses in your DNS provider after the platform is deployed. | Service | URL | | ------- | ------------------------------------------- | | API | **api.galileo**.company.\[com\|ai\|io…] | | Data | **data.galileo**.company.\[com\|ai\|io…] | | UI | **console.galileo**.company.\[com\|ai\|io…] | | Grafana | **grafana.galileo**.company.\[com\|ai\|io…] | ## Creating a GPU-enabled Node Group For specialized tasks that require GPU processing, such as machine learning workloads, Galileo supports the configuration of GPU-enabled node pools. 1. **Node Group Creation**: Create a `NCas_T4_v3-series` node group with name `galileo-ml` , min\_size 1, max\_size 5, and label `galileo-node-type=galileo-ml` 2. When this is done, please reach out to Galileo team so that we can update the deployment config for you. # Deploying Galileo on Amazon EKS Deploy Galileo on Amazon EKS with a step-by-step guide for configuring, managing, and scaling Galileo's infrastructure using Kubernetes clusters. ## Setting Up Your Kubernetes Cluster with EKS, IAM, and Trust Policies for Galileo Applications This guide provides a comprehensive walkthrough for configuring and deploying an EKS (Elastic Kubernetes Service) environment to support Galileo applications. Galileo applications are designed to operate efficiently on managed Kubernetes services like EKS (Amazon Elastic Kubernetes Service) and GKE (Google Kubernetes Engine). This document, however, will specifically address the setup process within an EKS environment, including the integration of IAM (Identity and Access Management) roles and Trust Policies, alongside configuring the necessary Galileo DNS endpoints. ### Prerequisites Before you begin, ensure you have the following: * An AWS account with administrative access * `kubectl` installed on your local machine * `aws-cli` version 2 installed and configured * Basic knowledge of Kubernetes, AWS EKS, and IAM policies Below lists the 4 steps to set deploy Galileo onto a an EKS environment. ### Setting Up the EKS Cluster 1. **Create an EKS Cluster**: Use the AWS Management Console or AWS CLI to create an EKS cluster in your preferred region. For CLI, use the command `aws eks create-cluster` with the necessary parameters. 2. **Configure kubectl**: Once your cluster is active, configure `kubectl` to communicate with your EKS cluster by running `aws eks update-kubeconfig --region --name `. ### Configuring IAM Roles and Trust Policies 1. **Create IAM Roles for EKS**: Navigate to the IAM console and create a new role. Select "EKS" as the trusted entity and attach policies that grant required permissions for managing the cluster. 2. **Set Up Trust Policies**: Edit the trust relationship of the IAM roles to allow the EKS service to assume these roles on behalf of your Kubernetes pods. ### Integrating Galileo DNS Endpoints 1. **Determine Galileo DNS Endpoints**: Identify the four DNS endpoints required by Galileo applications to function correctly. These typically include endpoints for database connections, API gateways, telemetry services, and external integrations. 2. **Configure DNS in Kubernetes**: Utilize ConfigMaps or external-dns controllers in Kubernetes to route your applications to the identified Galileo DNS endpoints effectively. ### Deploying Galileo Applications 1. **Prepare Application Manifests**: Ensure your Galileo application Kubernetes manifests are correctly set up with the necessary configurations, including environment variables pointing to the Galileo DNS endpoints. 2. **Deploy Applications**: Use `kubectl apply` to deploy your Galileo applications onto the EKS cluster. Monitor the deployment status to ensure they are running as expected. **Total time for deployment:** 30-45 minutes **This deployment requires the use of AWS CLI commands. If you only have cloud console access, follow the optional instructions below to get** [**eksctl**](https://eksctl.io/introduction/#installation) **working with AWS CloudShell.** ### Step 0: (Optional) Deploying via AWS CloudShell To use [`eksctl`](https://eksctl.io/introduction/#installation) via CloudShell in the AWS console, open a CloudShell session and do the following: ``` # Create directory mkdir -p $HOME/.local/bin cd $HOME/.local/bin # eksctl curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/eksctl $HOME/.local/bin ``` The rest of the installation deployment can now be run from the CloudShell session. You can use `vim` to create/edit the required yaml and json files within the shell session. ### Recommended Cluster Configuration Galileo recommends the following Kubernetes deployment configuration: | Configuration | Recommended Value | | ------------------------------------------------------ | --------------------------- | | **Nodes in the cluster’s core nodegroup** | 4 (min) 5 (max) 4 (desired) | | **CPU per core node** | 4 CPU | | **RAM per core node** | 16 GiB RAM | | **Number of nodes in the cluster’s runners nodegroup** | 1 (min) 5 (max) 1 (desired) | | **CPU per runner node** | 8 CPU | | **RAM per runner node** | 32 GiB RAM | | **Minimum volume size per node** | 200 GiB | | **Required Kubernetes API version** | 1.21 | | **Storage class** | gp2 | Here's an [example EKS cluster configuration](/galileo/how-to-and-faq/enterprise-only/deploying-galileo-eks/eks-cluster-config-example). ### Step 1: Creating Roles and Policies for the Cluster * **Galileo IAM Policy:** This policy is attached to the Galileo IAM Role. Add the following to a file called `galileo-policy.json` ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "eks:AccessKubernetesApi", "eks:DescribeCluster" ], "Resource": "arn:aws:eks:CLUSTER_REGION:ACCOUNT_ID:cluster/CLUSTER_NAME" } ] } ``` * **Galileo IAM Trust Policy:** This trust policy enables an external Galileo user to assume your Galileo IAM Role to deploy changes to your cluster securely. Add the following to a file called `galileo-trust-policy.json` ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": ["arn:aws:iam::273352303610:role/GalileoConnect"], "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } ``` * **Galileo IAM Role with Policy:** Role should only include the Galileo IAM Policy mentioned in this table. Create a file called `create-galileo-role-and-policies.sh`, make it executable with `chmod +x create-galileo-role-and-policies.sh` and run it. Make sure to run in the same directory as the json files created in the above steps. ```bash #!/bin/sh -ex aws iam create-policy --policy-name Galileo --policy-document file://galileo-policy.json aws iam create-role --role-name Galileo --assume-role-policy-document file://galileo-trust-policy.json aws iam attach-role-policy --role-name Galileo --policy-arn $(aws iam list-policies | jq -r '.Policies[] | select (.PolicyName == "Galileo") | .Arn') ``` ### Step 2: Deploying the EKS Cluster With the role and policies created, the cluster itself can be deployed in a single command using [eksctl](https://eksctl.io/introduction/#installation). Using the cluster template [here](/galileo/how-to-and-faq/enterprise-only/deploying-galileo-eks/eks-cluster-config-example), create a `galileo-cluster.yaml` file and edit the contents to replace `CUSTOMER_NAME` with your company name like `galileo`. Also check and update all `availabilityZones` as appropriate. With the yaml file saved, run the following command to deploy the cluster: ``` eksctl create cluster -f galileo-cluster.yaml ``` ### Step 3: EKS IAM Identity Mapping This ensures that only users who have access to this role can deploy changes to the cluster. Account owners can also make changes. This is easy to do with [eksctl](https://eksctl.io/usage/iam-identity-mappings/) with the following command: ```sh eksctl create iamidentitymapping --cluster customer-cluster --region your-region-id --arn "arn:aws:iam::CUSTOMER-ACCOUNT-ID:role/Galileo" --username galileo --group system:masters ``` **NOTE for the user:** For connected clusters, Galileo will apply changes from github actions. So github.com should be allow-listed for your cluster’s ingress rules if you have any specific network requirements. ### **Step 4: Required Configuration Values** Customer specific cluster values (e.g. domain name, slack channel for notifications etc) will be placed in a base64 encoded string, stored as a secret in GitHub that Galileo’s deployment automation will read in and use when templating a cluster’s resource files.\\ | Mandatory Field | Description | | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **AWS Account ID** | The Customer's AWS Account ID that the customer will use for provisioning Galileo | | **Galileo IAM Role Name** | The AWS IAM Role name the customer has created for the galileo deployment account to assume. | | **EKS Cluster Name** | The EKS cluster name that Galileo will deploy the platform to. | | **Domain Name** | The customer wishes to deploy the cluster under e.g. google.com | | **Root subdomain** | e.g. "galileo" as in galileo.google.com | | **Trusted SSL Certificates (Optional)** | By default, Galileo provisions Let’s Encrypt certificates. But if you wish to use your own trusted SSL certificates, you should submit a base64 encoded string of

1. the full certificate chain, and

2. another, separate base64 encoded string of the signing key. | | **AWS Access Key ID and Secret Access Key for Internal S3 Uploads (Optional)** | If you would like to export data into an s3 bucket of your choice. Please let us know the access key and secret key of the account that can make those upload calls. | **NOTE for the user:** Let Galileo know if you’d like to use LetsEncrypt or your own certificate before deployment. ### Step 5: Access to Deployment Logs As a customer, you have full access to the deployment logs in Google Cloud Storage. You (customer) are able to view all configuration there. A customer email address must be provided to have access to this log. ### **Step 6: Customer DNS Configuration** Galileo has 4 main URLs (shown below). In order to make the URLs accessible across the company, you have to set the following DNS addresses in your DNS provider after the platform is deployed. \*\* Time taken :\*\* 5-10 minutes (post the ingress endpoint / load balancer provisioning) \| Service | URL | | --- | --- | | API | **api.galileo**.company.\[com|ai|io…] | | UI | **console.galileo**.company.\[com|ai|io…] | | Grafana | **grafana.galileo**.company.\[com|ai|io…] | Each URL must be entered as a CNAME record into your DNS management system as the ELB address. You can find this address by listing the kubernetes ingresses that the platform has provisioned. ## Creating a GPU-enabled Node Pool For specialized tasks that require GPU processing, such as machine learning workloads, Galileo supports the configuration of GPU-enabled node pools. Here's how you can set up and manage a node pool with GPU-enabled nodes using `eksctl`, a command line tool for creating and managing Kubernetes clusters on Amazon EKS. 1. **Node Pool Creation**: Use `eksctl` to create a node pool with an Amazon Machine Image (AMI) that supports GPUs. This example uses the `g6.2xlarge` instances and specifies a GPU-compatible AMI. ``` eksctl create nodegroup --cluster your-cluster-name --name galileo-ml --node-type g6.2xlarge --nodes-min 1 --nodes-max 5 --node-ami ami-0656ebce2c7921ec0 --node-labels "galileo-node-type=galileo-ml" --region your-region-id ``` In this command, replace `your-cluster-name` and `your-region-id` with your specific details. The `--node-ami` option is used to specify the exact AMI that supports CUDA and GPU workloads. 2. If the cluster has low usage and you want to save costs, you may also choose to use cheaper GPU like `g4dn.2xlarge` . Note that it only saves costs when the usage is too low to saturate one GPU, otherwise it would even cost more. And don't choose this option if you use **Protect** that requires low real-time latency. ## Using Managed RDS Postgres DB server To use Managed RDS Postgres DB Server. You should create RDS Aurora directly in AWS console and Create K8s Secret and config map in kubernetes so that Galileo app can use it to connect to the DB server ### Creating RDS Aurora cluster 1. Go to AWS Console --> RDS Service and create a RDS Subnet group. * Select the VPC in which EKS cluster is running. * Select AZs A and B and the respective private subnets 1. Next Create a RDS aurora Postgres Cluster. Config for the cluster are listed below. General fields like cluster name, username, password etc can we enter as per cloud best practice. | Field | Recommended Value | | --------------------- | ------------------------------------- | | **Engine Version** | 16.x | | **DB Instance class** | db.t3.medium | | **VPC** | EKS cluster VPC ID | | **DB Subnet Group** | Select subnet group created in step 1 | | **Security Group ID** | Select Primary EKS cluster SG | | **Enable Encryption** | true | 1. Create K8s Secret * **Kubernetes resources:** Add the following to a file called `galileo-rds-details.yaml`. Update all marker \${xxx} text with appropriate values. Then run `kubectl apply -f galileo-rds-details.yaml` ```yaml --- apiVersion: v1 kind: Namespace metadata: name: galileo --- apiVersion: v1 kind: Secret metadata: name: postgres namespace: galileo type: Opaque data: GALILEO_POSTGRES_USER: "${db_username}" GALILEO_POSTGRES_PASSWORD: "${db_username}" GALILEO_POSTGRES_REPLICA_PASSWORD: "${db_master_password}" GALILEO_DATABASE_URL_WRITE: "postgresql+psycopg2://${db_username}:${db_master_password}@${db_endpoint}/${database_name}" GALILEO_DATABASE_URL_READ: "postgresql+psycopg2://${db_username}:${db_master_password}@${db_endpoint}/${database_name}" --- apiVersion: v1 kind: ConfigMap metadata: name: grafana-datasources namespace: galileo labels: app: grafana data: datasources.yaml: | apiVersion: 1 datasources: - access: proxy isDefault: true name: prometheus type: prometheus url: "http://prometheus.galileo.svc.cluster.local:9090" version: 1 - name: postgres type: postgres url: "${db_endpoint}" database: ${database_name} user: ${db_username} secureJsonData: password: ${db_master_password} jsonData: sslmode: "disable" --- ``` # Zero Access Deployment | Galileo on EKS Create a private Kubernetes Cluster with EKS in your AWS Account, upload containers to your container registry, and deploy Galileo. \*\* Total time for deployment:\*\* 45-60 minutes **This deployment requires the use of AWS CLI commands. If you only have cloud console access, follow the optional instructions below to get** [**eksctl**](https://eksctl.io/introduction/#installation) **working with AWS CloudShell.** ### Step 0: (Optional) Deploying via AWS CloudShell To use [`eksctl`](https://eksctl.io/introduction/#installation) via CloudShell in the AWS console, open a CloudShell session and do the following: ```sh # Create directory mkdir -p $HOME/.local/bin cd $HOME/.local/bin # eksctl curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/eksctl $HOME/.local/bin ``` The rest of the installation deployment can now be run from the CloudShell session. You can use `vim` to create/edit the required yaml and json files within the shell session. ### Recommended Cluster Configuration Galileo recommends the following Kubernetes deployment configuration: | Configuration | Recommended Value | | ------------------------------------------------------ | --------------------------- | | **Nodes in the cluster’s core nodegroup** | 4 (min) 5 (max) 4 (desired) | | **CPU per core node** | 4 CPU | | **RAM per core node** | 16 GiB RAM | | **Number of nodes in the cluster’s runners nodegroup** | 1 (min) 5 (max) 1 (desired) | | **CPU per runner node** | 8 CPU | | **RAM per runner node** | 32 GiB RAM | | **Minimum volume size per node** | 200 GiB | | **Required Kubernetes API version** | 1.21 | | **Storage class** | gp2 | Here's an [example EKS cluster configuration](/galileo/how-to-and-faq/enterprise-only/deploying-galileo-eks-zero-access/eks-cluster-config-example-zero-access). ### Step 1: Deploying the EKS Cluster The cluster itself can be deployed in a single command using [eksctl](https://eksctl.io/introduction/#installation). Using the cluster template [here](/galileo/how-to-and-faq/enterprise-only/deploying-galileo-eks-zero-access/eks-cluster-config-example-zero-access), create a `galileo-cluster.yaml` file and edit the contents to replace CLUSTER`_NAME` with a name for your cluster like `galileo`. Also check and update all `availabilityZones` as appropriate. With the yaml file saved, run the following command to deploy the cluster: ```sh eksctl create cluster -f galileo-cluster.yaml ``` ### **Step 2: Required Configuration Values** Customer specific cluster values (e.g. domain name, slack channel for notifications etc) will be placed in a base64 encoded string, stored as a secret in GitHub that Galileo’s deployment automation will read in and use when templating a cluster’s resource files.\\ **Mandatory fields the Galileo team requires:** | Mandatory Field | Description | | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Domain Name** | The customer wishes to deploy the cluster under e.g. google.com | | **Root subdomain** | e.g. "**galileo**" as in **galileo**.google.com | | **Trusted SSL Certificates** | These certificate should support the provided domain name. You should submit 2 base64 encoded strings;

1. one for the full certificate chain

2. one for the signing key. | ### Step 3: Deploy the Galileo Applications VPN access is required to connect to the Kubernetes API when interacting with a private cluster. If you do not have appropriate VPN access with private DNS resolution, you can use a bastion machine with public ssh access as a bridge to the private cluster. The bastion will only act as a simple shell environment, so a machine type of `t3.micro` or equivalent will suffice. Except where specifically noted, these steps are to be performed on a machine with internet access 1. Download version 1.23 of `kubectl` as explained [here](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html), and `scp` that file to the working directory of the bastion. 2. Generate the cluster config file by running `aws eks update-kubeconfig --name $CLUSTER_NAME --region $REGION` 3. If using a bastion machine, prepare the required environment with the following: 1. Either `scp` or copy and paste the contents of `~/.kube/config` from your local machine to the same directory on the bastion 2. `scp` the provided `deployment-manifest.yaml` file to the working directory of the bastion 4. With your VPN connected, or if using a bastion, ssh'ing into the bastion's shell: 1. Run `kubectl cluster-info` to verify your cluster config is set appropriately. If the cluster information is returned, you can proceed with the deployment. 2. Run `kubectl apply -f deployment-manifest.yaml` to deploy the Galileo applications. Re-run this command if there are errors related to custom resources not being defined as there are sometimes race conditions when applying large templates. ### **Step 4: Customer DNS Configuration** Galileo has 4 main URLs (shown below). In order to make the URLs accessible across the company, you have to set the following DNS addresses in your DNS provider after the platform is deployed. \*\* Time taken :\*\* 5-10 minutes (post the ingress endpoint / load balancer provisioning) | Service | URL | | ------- | ------------------------------------------- | | API | **api.galileo**.company.\[com\|ai\|io…] | | Data | **data.galileo**.company.\[com\|ai\|io…] | | UI | **console.galileo**.company.\[com\|ai\|io…] | | Grafana | **grafana.galileo**.company.\[com\|ai\|io…] | Each URL must be entered as a CNAME record into your DNS management system as the ELB address. You can find this address by running `kubectl -n galileo get svc/ingress-nginx-controller` and looking at the value for `EXTERNAL-IP`. # EKS Cluster Config Example | Zero Access Deployment Access a zero-access EKS cluster configuration example for secure Galileo deployments on Amazon EKS, following best practices for Kubernetes security. ```Bash --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: CLUSTER_NAME region: us-east-2 version: "1.23" tags: env: CLUSTER_NAME vpc: id: VPC_ID subnets: private: us-east-2a: id: SUBNET_1_ID us-east-2b: id: SUBNET_2_ID cloudWatch: clusterLogging: enableTypes: ["*"] privateCluster: enabled: true addons: - name: vpc-cni version: 1.11.0 - name: aws-ebs-csi-driver version: 1.11.4 managedNodeGroups: - name: galileo-core privateNetworking: true availabilityZones: ["us-east-2a", "us-east-2b"] labels: { galileo-node-type: galileo-core } tags: { "k8s.io/cluster-autoscaler/CLUSTER_NAME": "owned", "k8s.io/cluster-autoscaler/enabled": "true", } amiFamily: AmazonLinux2 instanceType: m5a.xlarge minSize: 4 maxSize: 5 desiredCapacity: 4 volumeSize: 200 # GiB volumeType: gp2 iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore withAddonPolicies: autoScaler: true cloudWatch: true ebs: true updateConfig: maxUnavailable: 2 - name: galileo-runner privateNetworking: true availabilityZones: ["us-east-2a", "us-east-2b"] labels: { galileo-node-type: galileo-runner } tags: { "k8s.io/cluster-autoscaler/CLUSTER_NAME": "owned", "k8s.io/cluster-autoscaler/enabled": "true", } amiFamily: AmazonLinux2 instanceType: m5a.2xlarge minSize: 1 maxSize: 5 desiredCapacity: 1 volumeSize: 200 # GiB volumeType: gp2 iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore withAddonPolicies: autoScaler: true cloudWatch: true ebs: true updateConfig: maxUnavailable: 2 ``` # EKS Cluster Config Example | Galileo Deployment Review a detailed EKS cluster configuration example for deploying Galileo on Amazon EKS, ensuring efficient Kubernetes setup and management. ```Bash --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: CLUSTER_NAME region: us-east-2 version: "1.28" tags: env: CLUSTER_NAME availabilityZones: ["us-east-2a", "us-east-2b"] cloudWatch: clusterLogging: enableTypes: ["*"] addons: - name: vpc-cni version: 1.13.4 - name: aws-ebs-csi-driver version: 1.29.1 managedNodeGroups: - name: galileo-core privateNetworking: true availabilityZones: ["us-east-2a", "us-east-2b"] labels: { galileo-node-type: galileo-core } tags: { "k8s.io/cluster-autoscaler/CLUSTER_NAME": "owned", "k8s.io/cluster-autoscaler/enabled": "true", } amiFamily: AmazonLinux2 instanceType: m5a.xlarge minSize: 2 maxSize: 5 desiredCapacity: 2 volumeSize: 200 volumeType: gp3 volumeEncrypted: true disableIMDSv1: false iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore withAddonPolicies: autoScaler: true cloudWatch: true ebs: true updateConfig: maxUnavailable: 2 - name: galileo-runner privateNetworking: true availabilityZones: ["us-east-2a", "us-east-2b"] labels: { galileo-node-type: galileo-runner } tags: { "k8s.io/cluster-autoscaler/CLUSTER_NAME": "owned", "k8s.io/cluster-autoscaler/enabled": "true", } amiFamily: AmazonLinux2 instanceType: m5a.2xlarge minSize: 1 maxSize: 5 desiredCapacity: 1 volumeSize: 200 # GiB volumeType: gp3 volumeEncrypted: true disableIMDSv1: false iam: attachPolicyARNs: - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore withAddonPolicies: autoScaler: true cloudWatch: true ebs: true updateConfig: maxUnavailable: 1 ``` # Updating Cluster Galileo EKS cluster update from 1.21 -> 1.23 ### Prerequisites: The AWS EBS CSI plugin has to be installed. This can be added to the `addons` sections in the eksctl config file. ``` addons: - name: aws-ebs-csi-driver version: 1.11.4 ``` The Amazon EBS CSI plugin requires IAM permissions to make calls to AWS APIs on your behalf, additional EBS policy has to be attached to the existing Galileo node groups This can be added in the ekscstl config file: ``` withAddonPolicies: - ebs: true ``` Apply changes to node-groups: ``` eksctl update nodegroup -f cluster-config.yaml ``` ### Upgrade to 1.23 Because Amazon EKS runs a highly available control plane, you can update only one minor version at a time. Current cluster version is 1.21 and you want to update to 1.23. You must first update your cluster to 1.22 and then update your 1.22 cluster to 1.23. #### Upgrade controle plane to 1.22 ``` eksctl upgrade cluster --name CLUSTER_NAME --version 1.22 --approve ``` #### Upgrade node groups to 1.22 ``` eksctl upgrade nodegroup --name=galileo-runner --cluster=CLUSTER_NAME --kubernetes-version=1.22 eksctl upgrade nodegroup --name=galileo-core --cluster=CLUSTER_NAME --kubernetes-version=1.22 ``` #### Upgrade controle plane to 1.23 ``` eksctl upgrade cluster --name CLUSTER_NAME --version 1.23 --approve ``` #### Upgrade node groups to 1.23 ``` eksctl upgrade nodegroup --name=galileo-core --cluster=CLUSTER_NAME --kubernetes-version=1.23 eksctl upgrade nodegroup --name=galileo-runner --cluster=CLUSTER_NAME --kubernetes-version=1.23 ``` #### Post upgrade checks Check if all pods are in ready state: ``` kubectl get pods --all-namespaces -o go-template='{{ range $item := .items }}{{ range .status.conditions }}{{ if (or (and (eq .type "PodScheduled") (eq .status "False")) (and (eq .type "Ready") (eq .status "False"))) }}{{ $item.metadata.name}} {{ end }}{{ end }}{{ end }}' ``` Check for pending persistance volumes: ``` kubectl get pvc --all-namespaces | grep -i pending ``` # Exoscale The Galileo applications run on managed Kubernetes-like environments, but this document will specifically cover the configuration and deployment of an Exoscale Cloud SKS environment. \*\* Total time for deployment:\*\* 30-45 minutes **This deployment requires the use of** [**Exoscale CLI commands**](https://community.exoscale.com/documentation/tools/exoscale-command-line-interface/)**. Before you start install the Exo CLI following the official documentation.** ## [](#recommended-cluster-configuration) Recommended Cluster Configuration | Configuration | Recommended Value | | -------------------------------------------------- | ----------------- | | Nodes in the cluster’s core nodegroup | 5 | | CPU per core node | 4 CPU | | RAM per core node | 16 GiB RAM | | Minimum volume size per node | 400 GiB | | Number of nodes in the cluster’s runners nodegroup | 2 | | CPU per runner node | 8 CPU | | RAM per runner node | 32 GiB RAM | | Minimum volume size per node | 200 GiB | | Required Kubernetes API version | 1.24 | ## Deploying the SKS Cluster 1. **Create security groups** ```sh exo compute security-group create sks-security-group exo compute security-group rule add sks-security-group \ --description "NodePort services" \ --protocol tcp \ --network 0.0.0.0/0 \ --port 30000-32767 exo compute security-group rule add sks-security-group \ --description "SKS kubelet" \ --protocol tcp \ --port 10250 \ --security-group sks-security-group exo compute security-group rule add sks-security-group \ --description "Calico traffic" \ --protocol udp \ --port 4789 \ --security-group sks-security-group ``` 1. **Create SKS cluster** ```sh exo compute sks create galileo \ --kubernetes-version "1.24" --zone ch-gva-2 \ --nodepool-name galileo-core \ --nodepool-size 6 \ --nodepool-disk-size 400 \ --nodepool-instance-prefix "galileo-core" \ --nodepool-instance-type "extra-large" \ --nodepool-label "galileo-node-type=galileo-core" \ --nodepool-security-group sks-security-group exo compute sks nodepool add galileo galileo-runner \ --zone ch-gva-2 \ --size 2 \ --size 400 \ --instance-prefix "galileo-runner" \ --instance-type "extra-large" \ --label "galileo-node-type=galileo-runner" \ --security-group sks-security-group ``` ## Deploy distributed block storage Longhorn is Open-Source Software that you can install inside your SKS cluster. Installation of Longhorn takes a few minutes, you need a SKS Cluster and access to this cluster via kubectl. ```sh kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/1.3.1/deploy/longhorn.yaml ``` ## Required Configuration Values Customer specific cluster values (e.g. domain name, slack channel for notifications etc) will be placed in a base64 encoded string, stored as a secret in GitHub that Galileo’s deployment automation will read in and use when templating a cluster's resource files. | Mandatory Field | Description | | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **SKS Cluster Name** | The SKS cluster name | | **Galileo runner instance pool ID** | SKS galileo-runner instance pool ID | | **Exoscale API keys** | Exoscale EXOSCALE\_API\_KEY and EXOSCALE\_API\_SECRET with Object Storage Buckets permissions: - create - get - list | | **Exoscale storage host** | e.g sos-ch-gva-2.exo.io | | **Domain Name** | The customer wishes to deploy the cluster under e.g. google.com | | **Root subdomain** | e.g. "galileo" as in galileo.google.com | | **Trusted SSL Certificates (Optional)** | By default, Galileo provisions Let’s Encrypt certificates. But if you wish to use your own trusted SSL certificates, you should submit a base64 encoded string of

1. the full certificate chain, and

2. another, separate base64 encoded string of the signing key. | ## Access to Deployment Logs As a customer, you have full access to the deployment logs in Google Cloud Storage. You (customer) are able to view all configurations there. A customer email address must be provided to have access to this log. ## Customer DNS Configuration Galileo has 4 main URLs (shown below). In order to make the URLs accessible across the company, you have to set the following DNS addresses in your DNS provider after the platform is deployed. | Service | URL | | ------- | ----------------------------------------------- | | API | \*\*api.galileo.\*\*company.\[com\|ai\|io…] | | Data | \*\*data.galileo.\*\*company.\[com\|ai\|io…] | | UI | \*\*console.galileo.\*\*company.\[com\|ai\|io…] | | Grafana | **grafana.galileo**.company.\[com\|ai\|io…] | # Deploying Galileo on Google GKE Deploy Galileo on Google Kubernetes Engine (GKE) with this guide, covering configuration steps, cluster setup, and infrastructure scaling strategies. ## Setting Up Your Kubernetes Cluster for Galileo Applications on Google Kubernetes Engine (GKE) Welcome to your guide on configuring and deploying a Google Kubernetes Engine (GKE) environment optimized for Galileo applications. Galileo, tailored for dynamic and scalable deployments, requires a robust and adaptable infrastructure—qualities inherent to Kubernetes. This guide will navigate you through the preparatory steps involving Identity and Access Management (IAM) and the DNS setup crucial for integrating Galileo's services. ### Prerequisites Before diving into the setup, ensure you have the following: * A Google Cloud account. * The Google Cloud SDK installed and initialized. * Kubernetes command-line tool (`kubectl`) installed. * Basic familiarity with GKE, IAM roles, and Kubernetes concepts. ### Setting Up IAM Identity and Access Management (IAM) plays a critical role in securing and granting the appropriate permissions for your Kubernetes cluster. Here's how to configure IAM for your GKE environment: 1. **Create a Project**: Sign in to your Google Cloud Console and create a new project for your Galileo application if you haven't done so already. 2. **Set Up IAM Roles**: Navigate to the IAM & Admin section in the Google Cloud Console. Here, assign the necessary roles to your Google Cloud account, ensuring you have rights for GKE administration. Essential roles include `roles/container.admin` (for managing clusters), `roles/iam.serviceAccountUser` (to use service accounts with your clusters), and any other roles specific to your operational needs. 3. **Configure Service Accounts**: Create a service account dedicated to your GKE cluster to segregate duties and enhance security. Assign the service account the minimal roles necessary to operate your Galileo applications efficiently. ### Configuring DNS for Galileo Your Galileo application requires four DNS endpoints for optimal functionality. These endpoints handle different aspects of the application's operations and need to be properly set up: 1. **Acquire a Domain**: If not already owned, purchase a domain name that will serve as the base URL for Galileo. 2. **Set Up DNS Records**: Utilize your domain registrar's DNS management tools to create four DNS A records pointing to the Galileo application's operational endpoints. These records will route traffic correctly within your GKE environment. More details in the [Step 3: Customer DNS Configuration](/galileo/how-to-and-faq/enterprise-only/deploying-galileo-gke#step-3-customer-dns-configuration) section. ### Deploying Your Cluster on GKE With IAM configured and DNS set up, you’re now ready to deploy your Kubernetes cluster on GKE. 1. **Create the Cluster**: Use the `gcloud` command-line tool to create your cluster. Ensure that it is configured with the correct machine type, node count, and other specifications suitable for your Galileo application needs. 2. **Deploy Galileo**: With your cluster running, deploy your Galileo application. Employ `kubectl` to manage resources and deploy services necessary for your application. 3. **Verify Deployment**: After deployment, verify that your Galileo application is running smoothly by checking the service status and ensuring that external endpoints are reachable. \*\* Total time for deployment:\*\* 30-45 minutes **This deployment requires the use of Google Cloud's CLI,** `**gcloud**`**. Please follow** [**these instructions**](https://cloud.google.com/sdk/docs/install) **to install and set up gcloud for your GCP account.** ### Recommended Cluster Configuration Galileo recommends the following Kubernetes deployment configuration. These details are captured in the bootstrap script Galileo provides. | Configuration | Recommended Value | | ------------------------------------------------------ | --------------------------- | | **Nodes in the cluster’s core nodegroup** | 4 (min) 5 (max) 4 (desired) | | **CPU per core node** | 4 CPU | | **RAM per core node** | 16 GiB RAM | | **Number of nodes in the cluster’s runners nodegroup** | 1 (min) 5 (max) 1 (desired) | | **CPU per runner node** | 8 CPU | | **RAM per runner node** | 32 GiB RAM | | **Minimum volume size per node** | 200 GiB | | **Required Kubernetes API version** | 1.21 | | **Storage class** | standard | ### Step 0: Deploying the GKE Cluster Run [this script](https://docs.rungalileo.io/galileo/how-to-and-faq/enterprise-only/deploying-galileo-gke/galileo-gcp-setup-script) as instructed. If you have specialized tasks that require GPU processing make sure CREATE\_ML\_NODE\_POOL=true is set before running the script. If you have any questions, please reach out to a Galilean in the slack channel Galileo shares with you and your team. ### **Step 1: Required Configuration Values** Customer specific cluster values (e.g. domain name, slack channel for notifications etc) will be placed in a base64 encoded string, stored as a secret in GitHub that Galileo’s deployment automation will read in and use when templating a cluster’s resource files.\\ **Mandatory fields the Galileo team requires:** | Mandatory Field | Description | | ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **GCP Account ID** | The Customer's GCP Account ID that the customer will use for provisioning Galileo | | **Customer GCP Project Name** | The Name of the GCP project the customer is using to provision Galileo. | | **Customer Service Account Address for Galileo** | The Service account address the customer has created for the galileo deployment account to assume. | | **GKE Cluster Name** | The GKE cluster name that Galileo will deploy the platform to. | | **Domain Name** | The customer wishes to deploy the cluster under e.g. google.com | | **GKE Cluster Region** | The region of the cluster. | | **Root subdomain** | e.g. "galileo" as in galileo.google.com | | **Trusted SSL Certificates (Optional)** | By default, Galileo provisions Let’s Encrypt certificates. But if you wish to use your own trusted SSL certificates, you should submit a base64 encoded string of

1. the full certificate chain, and

2. another, separate base64 encoded string of the signing key. | ### Step 2: Access to Deployment Logs As a customer, you have full access to the deployment logs in Google Cloud Storage. You (customer) are able to view all configuration there. A customer email address must be provided to have access to this log. ### **Step 3: Customer DNS Configuration** Galileo has 4 main URLs (shown below). In order to make the URLs accessible across the company, you have to set the following DNS addresses in your DNS provider after the platform is deployed. \*\* Time taken :\*\* 5-10 minutes (post the ingress endpoint / load balancer provisioning) | Service | URL | | ------- | ------------------------------------------- | | API | **api.galileo**.company.\[com\|ai\|io…] | | Data | **data.galileo**.company.\[com\|ai\|io…] | | UI | **console.galileo**.company.\[com\|ai\|io…] | | Grafana | **grafana.galileo**.company.\[com\|ai\|io…] | ### Step 4: Post-deployment health-checks #### Set up Firewall Rule for Horizontal Pod Autoscaler On GKE, only a few ports allow inbound traffic by default. Unfortunately, this breaks our HPA setup. You can run `kubectl -n galileo get hpa` and check `unknown` values to confirm this. In order to fix this, please follow the steps below: 1. Go to `Firewall policies` page on GCP console, and click `CREATE FIREWALL RULE` 2. Set `Target tags` to the [network tags of the GCE VMs](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#gke_private_clusters_10-). You can find the tags like this on the GCE instance detail page. 3. Set `source IPv4 ranges` to the range that includes the cluster internal endpoint, which can be found on cluster basics (([link](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#step_1_view_control_planes_cidr_block))). 4. Allow TCP port 6443. 5. After creating the firewall rule, wait for a few minutes, and rerun `kubectl -n galileo get hpa` to confirm `unknown` is gone. ## Creating a GPU-enabled Node Group For specialized tasks that require GPU processing, such as machine learning workloads, Galileo supports the configuration of GPU-enabled node pools. 1. **Node Group Creation**: Create a `g2-standard-8` node group with name `galileo-ml` , min\_size 1, max\_size 5, and label `galileo-node-type=galileo-ml` 2. If the cluster has low usage and you want to save costs, you may also choose to use cheaper GPU like `n1-standard-8` with GPU T4. Note that it only saves costs when the usage is too low to saturate one GPU, otherwise it would even cost more. And don't choose this option if you use **Protect** that requires low real-time latency. 3. When this is done, please reach out to Galileo team so that we can update the deployment config for you. 4. In order to make Horizontal Pod Autoscaler work on GPU node group, it's required to update the cluster **Node auto-provisioning** config to add limit for specified GPU type. # Cluster Setup Script Utilize the Galileo GCP setup script for automating Google Cloud Platform (GCP) configuration to deploy Galileo seamlessly on GKE clusters. ```Bash #!/bin/sh -e # # Usage # CUSTOMER_NAME=customer-name REGION=us-central1 ZONE_ID=a CREATE_ML_NODE_POOL=false ./bootstrap.sh if [ -z "$CUSTOMER_NAME" ]; then echo "Error: CUSTOMER_NAME is not set" exit 0 fi PROJECT="$CUSTOMER_NAME-galileo" REGION=${REGION:="us-central1"} ZONE_ID=${ZONE_ID:="c"} ZONE="$REGION-$ZONE_ID" CLUSTER_NAME="galileo" echo "Bootstrapping cluster with the following parameters:" echo "PROJECT: ${PROJECT}" echo "REGION: ${REGION}" echo "ZONE: ${ZONE}" echo "CLUSTER_NAME: ${CLUSTER_NAME}" # # Create a project for Galileo. # echo "Create a project for Galileo." gcloud projects create $PROJECT || true # # Enabling services as referenced here https://cloud.google.com/migrate/containers/docs/config-dev-env#enabling_required_services # echo "Enabling services as referenced here https://cloud.google.com/migrate/containers/docs/config-dev-env#enabling_required_services" gcloud services enable --project=$PROJECT servicemanagement.googleapis.com servicecontrol.googleapis.com cloudresourcemanager.googleapis.com compute.googleapis.com container.googleapis.com containerregistry.googleapis.com cloudbuild.googleapis.com # # Grab the project number. # echo "Grab the project number." PROJECT_NUMBER=$(gcloud projects describe $PROJECT --format json | jq -r -c .projectNumber) # # Create service accounts and policy bindings. # echo "Create service accounts and policy bindings." gcloud iam service-accounts create galileoconnect \ --project "$PROJECT" gcloud iam service-accounts add-iam-policy-binding galileoconnect@$PROJECT.iam.gserviceaccount.com \ --project "$PROJECT" \ --member "group:devs@rungalileo.io" \ --role "roles/iam.serviceAccountUser" gcloud iam service-accounts add-iam-policy-binding galileoconnect@$PROJECT.iam.gserviceaccount.com \ --project "$PROJECT" \ --member "group:devs@rungalileo.io" \ --role "roles/iam.serviceAccountTokenCreator" gcloud projects add-iam-policy-binding $PROJECT --member="serviceAccount:galileoconnect@$PROJECT.iam.gserviceaccount.com" --role="roles/container.admin" gcloud projects add-iam-policy-binding $PROJECT --member="serviceAccount:galileoconnect@$PROJECT.iam.gserviceaccount.com" --role="roles/container.clusterViewer" # # Waiting before provisioning workload identity. # echo "Waiting before provisioning workload identity..." sleep 5 # # Create a workload identity pool. # echo "Create a workload identity pool." gcloud iam workload-identity-pools create galileoconnectpool \ --project "$PROJECT" \ --location "global" \ --description "Workload ID Pool for Galileo via GitHub Actions" \ --display-name "GalileoConnectPool" # # Create a workload identity provider . # echo "Create a workload identity provider ." gcloud iam workload-identity-pools providers create-oidc galileoconnectprovider \ --project "$PROJECT" \ --location "global" \ --workload-identity-pool "galileoconnectpool" \ --display-name "GalileoConnectProvider" \ --attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.aud=assertion.aud,attribute.repository_owner=assertion.repository_owner,attribute.repository=assertion.repository" \ --issuer-uri="https://token.actions.githubusercontent.com" # # Bind the service account to the workload identity provider. # echo "Bind the service account to the workload identity provider." gcloud iam service-accounts add-iam-policy-binding "galileoconnect@${PROJECT}.iam.gserviceaccount.com" \ --project "$PROJECT" \ --role="roles/iam.workloadIdentityUser" \ --member="principalSet://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/galileoconnectpool/attribute.repository/rungalileo/deploy" # # Create the cluster (with one node pool) and the runners node pool. # The network config below assumes you have a default VPC in your account. # If you want to use a different VPC, please update the option values for # `--network` and `--subnetwork` below. # echo "Create the cluster (with one node pool) and the runners node pool." gcloud beta container \ --project $PROJECT clusters create $CLUSTER_NAME \ --zone $ZONE \ --no-enable-basic-auth \ --cluster-version "1.27" \ --release-channel "regular" \ --machine-type "e2-standard-4" \ --image-type "cos_containerd" \ --disk-type "pd-standard" \ --disk-size "300" \ --node-labels galileo-node-type=galileo-core \ --metadata disable-legacy-endpoints=true \ --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \ --max-pods-per-node "110" \ --num-nodes "4" \ --logging=SYSTEM,WORKLOAD \ --monitoring=SYSTEM \ --enable-ip-alias \ --network "projects/$PROJECT/global/networks/default" \ --subnetwork "projects/$PROJECT/regions/$REGION/subnetworks/default" \ --no-enable-intra-node-visibility \ --default-max-pods-per-node "110" \ --enable-autoscaling \ --min-nodes "4" \ --max-nodes "5" \ --no-enable-master-authorized-networks \ --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ --enable-autoupgrade \ --enable-autorepair \ --max-surge-upgrade 1 \ --max-unavailable-upgrade 0 \ --enable-autoprovisioning \ --min-cpu 0 \ --max-cpu 50 \ --min-memory 0 \ --max-memory 200 \ --autoprovisioning-scopes=https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring \ --enable-autoprovisioning-autorepair \ --enable-autoprovisioning-autoupgrade \ --autoprovisioning-max-surge-upgrade 1 \ --autoprovisioning-max-unavailable-upgrade 0 \ --enable-shielded-nodes \ --node-locations $ZONE \ --enable-network-policy gcloud beta container \ --project $PROJECT node-pools create "galileo-runners" \ --cluster $CLUSTER_NAME \ --zone $ZONE \ --machine-type "e2-standard-8" \ --image-type "cos_containerd" \ --disk-type "pd-standard" \ --disk-size "100" \ --node-labels galileo-node-type=galileo-runner \ --metadata disable-legacy-endpoints=true \ --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \ --num-nodes "1" \ --enable-autoscaling \ --min-nodes "1" \ --max-nodes "5" \ --enable-autoupgrade \ --enable-autorepair \ --max-surge-upgrade 1 \ --max-unavailable-upgrade 0 \ --max-pods-per-node "110" \ --node-locations $ZONE if [[ -n "$CREATE_ML_NODE_POOL" && "$CREATE_ML_NODE_POOL" == "true" ]]; then gcloud beta container \ --project $PROJECT node-pools create "galileo-ml" \ --cluster $CLUSTER_NAME \ --zone $ZONE \ --machine-type "g2-standard-8" \ --image-type "cos_containerd" \ --disk-type "pd-standard" \ --disk-size "100" \ --node-labels galileo-node-type=galileo-ml \ --metadata disable-legacy-endpoints=true \ --scopes "https://www.googleapis.com/auth/devstorage.read_write","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \ --num-nodes "1" \ --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \ --node-locations $ZONE \ --enable-autoscaling \ --enable-autoupgrade \ --enable-autorepair \ --max-surge-upgrade 1 \ --max-unavailable-upgrade 0 \ --max-pods-per-node "110" \ --min-nodes 1 \ --max-nodes 5 fi ``` # Enterprise Deployment Gain an overview of Galileo deployment options, covering supported platforms like Amazon EKS and Google GKE, setup requirements, and best practices. Tutorials and walkthroughs of enterprise-only features. Jump to a guide for the task you're trying to complete: {" "} # Post Deployment Checklist The following guide will walk you through steps you can take to make sure your Galileo cluster is properly deployed and running well. *This guide applies to all cloud providers.* ### 1. Confirm that all DNS records have been created. Galileo will not set DNS records for your cluster and as such you need to set those appropriately for your company. Each record should have a TTL of 60 seconds or less. If you are letting Galileo provision Let's Encrypt certificates for you automatically with cert-manager, it's important to make sure that all of cert-manager's http solvers have told Let's Encrypt to provision a certificate with all of the domains specified for the cluster (i.e. `api|console|data|grafana.my-cluster.my-domain.com` ) ``` kubectl get ingress -n galileo | grep -i http-solver ``` When you run the above command, if you see no output, then the solvers should have finished. You can check this by visiting any of the domains for your cluster. ### 2. Check the API's health-check. ``` curl -I -X GET https://api../healthcheck ``` If the response is a 200, then this is a good sign that almost everything is up and running as expected. ### 3. Check for unready pods. ``` kubectl get pods --all-namespaces -o go-template='{{ range $item := .items }}{{ range .status.conditions }}{{ if (or (and (eq .type "PodScheduled") (eq .status "False")) (and (eq .type "Ready") (eq .status "False"))) }}{{ $item.metadata.name}} {{ end }}{{ end }}{{ end }}' ``` If any pods are in an unready state, especially in the namespace where the Galileo platform was deployed, please notify the appropriate representative from Galileo and they will help to solve the issue. ### 4. Check for pending persistent volume claims. ``` kubectl get pvc --all-namespaces | grep -i pending ``` If any persistent volume claims are in a pending state, especially in the namespace where the Galileo platform was deployed, please notify the appropriate representative from Galileo and they will help to solve the issue. ### 5. Clickhouse keeper fails to start ``` kubectl get sts --all-namespaces | grep -i clickhouse-keeper ``` If there is a statefulset `clickhouse-keeper` with zero ready replicas, it means the kubernetes version is incompatible, please take the following steps: 1. Upgrade kubernetes version (control plane + node groups) to at least 1.30 2. Delete the broken CRD with `kubectl delete crd clickhousekeeperinstallations.clickhouse-keeper.altinity.com` 3. Delete the clickhouse operator with `kubectl delete deploy clickhouse-operator` 4. Re-apply the manifest 5. Wait for 2 minutes, confirm 3 clickhouse keeper statefulsets `chk-clickhouse-keeper-cluster` are up with `kubectl get sts --all-namespaces | grep -i clickhouse-keeper` 6. If you still see an unhealthy statefulset `clickhouse-keeper` along with those 3, just clean up the statefulset and its pvc with `kubectl delete sts clickhouse-keeper && kubectl delete pvc data-volume-claim-clickhouse-keeper-0` # Pre Requisites Before deploying Galileo, ensure the following prerequisites are met. * The ability to create a Kubernetes cluster. * The `kubectl` command-line tool is installed and configured to interact with your cluster. * Kubernetes version 1.21 or higher installed on your cluster, as Galileo requires specific Kubernetes API functionalities. # Scheduling Automatic Backups For Your Cluster Schedule automatic backups for Galileo clusters with this guide, ensuring data security, disaster recovery, and operational resilience for deployments. ### Velero Velero is a convenient backup tool for Kubernetes clusters that compresses and backs up Kubernetes objects to object storage. It also takes snapshots of your cluster’s Persistent Volumes using your cloud provider’s block storage snapshot features, and can then restore your cluster’s objects and Persistent Volumes to a previous state. } href="https://velero.io/docs/v1.9/" horizontal /> ### Installing the Velero CLI MacOS: ``` brew install velero ``` Linux: ``` INSTALL_PATH='/usr/local/bin' wget -O velero.tar.gz https://github.com/vmware-tanzu/velero/releases/download/v1.9.2/velero-v1.9.2-linux-amd64.tar.gz tar -xvf velero.tar.gz && cd velero-v1.9.2-linux-amd64 && mv velero $INSTALL_PATH && chmod +x ${INSTALL_PATH}/velero ``` ### Prerequisites Before setting up the velero components, you will need to prepare your AWS/GCP object storage, secrets and a dedicated user with access to resources required to perform a backup. The instructions below will guide you. ### AWS EKS: Installing Velero [AWS Setup Script](/galileo/how-to-and-faq/enterprise-only/scheduling-automatic-backups-for-your-cluster/aws-velero-account-setup-script) Create s3 bucket: ``` aws s3api create-bucket \ --bucket \ --region \ --create-bucket-configuration LocationConstraint= ``` 1. Create IAM user and attach a IAM policy with necessary permissions: ``` aws iam create-user --user-name velero ``` IAM policy: ``` cat > velero-policy.json < aws_secret_access_key= ``` All the steps above are included in the [AWS velero account setup script](/galileo/how-to-and-faq/enterprise-only/scheduling-automatic-backups-for-your-cluster/aws-velero-account-setup-script) 4. Installing velero The velero install command will perform the setup steps to get the cluster ready for backups. ``` velero install \ --provider aws \ --backup-location-config region= \ --snapshot-location-config region= \ --bucket velero-backups \ --plugins velero/velero-plugin-for-aws:v1.4.0 \ --secret-file ./credentials-velero ``` ### GCP GKE: Installing Velero [GCP Setup Script](/galileo/how-to-and-faq/enterprise-only/scheduling-automatic-backups-for-your-cluster/gcp-velero-account-setup-script) 1. Create GCS bucket ``` gsutil mb gs:/// ``` 2. Create Google Service Account (GSA) ``` gcloud iam service-accounts create velero \ --display-name "Velero service account" ``` 3. Create Custom Role with Permissions for the Velero ``` ROLE_PERMISSIONS=( compute.disks.get compute.disks.create compute.disks.createSnapshot compute.snapshots.get compute.snapshots.create compute.snapshots.useReadOnly compute.snapshots.delete compute.zones.get storage.objects.create storage.objects.delete storage.objects.get storage.objects.list ) PROJECT_ID=$(gcloud config get-value project) SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \ --filter="displayName:Velero service account" \ --format 'value(email)') gcloud iam roles create velero.server \ --project $PROJECT_ID \ --title "Velero Server" \ --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")" gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \ --role projects/$PROJECT_ID/roles/velero.server gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs:// gcloud iam service-accounts keys create credentials-velero \ --iam-account $SERVICE_ACCOUNT_EMAIL ``` All the steps above are included in the [GCP velero account setup script](/galileo/how-to-and-faq/enterprise-only/scheduling-automatic-backups-for-your-cluster/gcp-velero-account-setup-script) 4. Install velero ``` velero install \ --provider gcp \ --bucket velero-backups \ --plugins velero/velero-plugin-for-gcp:v1.5.0 \ --secret-file ./credentials-velero ``` #### Backups Setup daily backups: ``` velero schedule create daily-backups --schedule "0 7 * * *" # Take initial backup velero backup create --from-schedule daily-backup # Get backup list velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR daily-backup-20221004070030 Completed 0 0 2022-10-04 09:00:30 +0200 CEST 29d default daily-backup-20221003193617 Completed 0 0 2022-10-03 21:36:30 +0200 CEST 29d default ``` #### Restore from backup NOTE: Existing cluster resources will not be overwritten by the restoration process. To restore a PV delete it from the cluster before running the restore command ``` velero restore create --from-backup daily-backup-20221003193617 ``` NOTE: All DNS entries have to be updated after restore as velero does not persist the ingress IP/LB names. # Aws Velero Account Setup Script Automate AWS Velero setup for Galileo cluster backups with this script, ensuring seamless backup scheduling and data resilience for AWS deployments. ``` #!/bin/sh -e # Usage # ./velero-account-setup-aws.sh # # print_usage() { echo -e "\n Usage: \n ./velero-account-setup-aws.sh \n" } BUCKET="${1}" AWS_REGION="${2}" if [ $# -ne 2 ]; then print_usage exit 1 fi aws s3api create-bucket \ --bucket $BUCKET \ --region $AWS_REGION \ --create-bucket-configuration LocationConstraint=$REGION \ --no-cli-pager aws iam create-user --user-name velero --no-cli-pager cat > velero-policy.json < credentials-velero < # # GSA_NAME=velero ROLE_PERMISSIONS=( compute.disks.get compute.disks.create compute.disks.createSnapshot compute.snapshots.get compute.snapshots.create compute.snapshots.useReadOnly compute.snapshots.delete compute.zones.get storage.objects.create storage.objects.delete storage.objects.get storage.objects.list ) print_usage() { echo -e "\n Usage: \n ./velero-account-setup-gcp.sh \n" } BUCKET="${1}" if [ -z "$BUCKET" ]; then print_usage exit 1 fi gsutil mb gs://$BUCKET PROJECT_ID=$(gcloud config get-value project) gcloud iam service-accounts create $GSA_NAME \ --display-name "Velero service account" SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \ --filter="displayName:Velero service account" \ --format 'value(email)') gcloud iam roles create velero.server \ --project $PROJECT_ID \ --title "Velero Server" \ --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")" gcloud projects add-iam-policy-binding $PROJECT_ID \ --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \ --role projects/$PROJECT_ID/roles/velero.server gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET} gcloud iam service-accounts keys create credentials-velero \ --iam-account $SERVICE_ACCOUNT_EMAIL ``` # Security & Access Control This page covers networking, security and access control provisions that Galileo deployments enable ### Networking / Firewalls #### Air-Gapped Deployments Galileo's fully air-gapped deployments provide enterprises with a solution for deploying Kubernetes clusters in non-cloud environments, enabling them to securely and efficiently run their applications within their own enterprise networks or VPCs, without the need for external connectivity or reliance on cloud infrastructure. With air-gapped deployments, organizations maintain complete control and autonomy over their Kubernetes clusters, ensuring the utmost security, privacy, and compliance with internal policies and regulations. This eliminates the need for internet connectivity or external dependencies, making it suitable for sensitive environments where data integrity and confidentiality are paramount. This ensures that the cluster remains isolated from external networks, minimizings the potential attack surface area. All components, including master nodes, worker nodes, and control plane components, operate solely within the confines of the enterprise network or VPC. #### Configurable Ingress / Egress Galileo's endpoints and load-balancers can be customized during deployment to handle various combinations of limited access to both internal and external environments. This includes all combinations of ingress and egress to both types of environments. If your provider is not listed above, additional SSO providers can be added on-demand as per customer requirements. ### Access Control Galileo deployments have a default settings of having all projects and runs private (only visible to the user who creates them), with invite-only sharing turned on by default. Galileo also has 2 default roles: Admin & User. Admins have the ability to grant / revoke user access Galileo provides configurable access-control mechanisms (Role-based access) for enterprises / teams with custom access requirements. # Setting Up New Users Learn how to onboard new users in Galileo deployments with detailed instructions on user roles, access control, and permissions management. ### What is a Galileo User? Each person has their own account with Galileo with their own login credentials. Each Galileo User needs to provide their own credentials to the Galileo client when training their models such that their runs are logged under, and visible in their individual Galileo console. ### How to create the Admin User? You should have an Admin User created during the deployment step. If we did not create one, the Galileo console will prompt you to create the Admin User first: ### How to Add a new user? The Admin User has the ability to invite users to set up their own accounts with Galileo. ### How to manage user permissions? Go to "Settings & Permissions" to manage your users and groups. Check out this How To guide on defining [Access Controls](/galileo/gen-ai-studio-products/galileo-evaluate/how-to/access-control). # SSO Integration This page covers our SSO Integration support with information we need to setup SSO for your Galileo cluster. # Single Sign On Galileo provides Single Sign-on capabilities for various providers using the OIDC protocol. See details below for how to configure each provider. | Provider | Integration | | ---------------------- | ----------- | | Okta | OIDC | | Azure Active Directory | OIDC | | PingFederate | OIDC | | Google | OIDC | | Github | OIDC | | Custom OIDC provider | OIDC | If your provider is not listed above, additional SSO providers can be added on-demand as per requirements. ## Setting Up SSO with Galileo ### Google 1. Follow [this guide](https://support.google.com/cloud/answer/6158849?hl=en#zippy=) to set up **OAuth credentials**. **User Type** is **Internal**, **Scopes** are **.../auth/userinfo.profile** and **openid**, **Authorized domains** is your domain for Galileo console. 2. When creating new client ID, set **type** to **Web application**, set **Authorized redirect URIs** to `https://{CONSOLE_URL}/api/auth/callback/google` 3. Share **Client ID** and **Client Secret** with Galileo ### Okta 1. Follow [this guide](https://help.okta.com/en-us/content/topics/apps/apps_app_integration_wizard_oidc.htm) to create a new application. Select **OIDC - OpenID Connect** as the **Sign-in method**, **Web Application** as the application type, **Authorization Code** as the **Grant Type** 2. Set **Sign-in redirect URIs** to `https://{CONSOLE_URL}/api/auth/callback/okta`, and **Sign-out redirect URIs** to `https://{CONSOLE_URL}`. 3. Share **Issuer URL**, **Client ID** and **Client Secret** with Galileo 1. Find **Issuer URL** in Security -> API in admin panel. Audience should be `api://default` ### Microsoft Entra ID (formerly Azure Active Directory) 1. Follow [this guide](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app) to create a new application. Under **Redirect URI**, set type to **Web** and URI to `https://{CONSOLE_URL}/api/auth/callback/azure-ad` 2. Go to **Token configuration** page, **Add Optional Claim**, choose **ID** token and **email** claim. 1. Please ensure each user has the **email** set in the **Contact Information** properties. We will use this email as the account on Galileo. 3. Go to **Certificates & secrets** page, click **New Client Secret** and create a new secret. 4. Share the **Tenant ID**, **Client ID** and **Client Secret** with Galileo ### PingFederate 1. Follow [this guide](https://docs.pingidentity.com/r/en-us/pingone/pingone_edit_application_oidc) to create an application with Application Type **OIDC Web App** 2. Go to app **configuration** page, edit it by setting **Redirect URIs** to `https://{CONSOLE_URL}/api/auth/callback/ping-federate` 3. Share the **Environment ID**, **Client ID** and **Client Secret** with Galileo ### Custom OIDC Provider 1. Create an application/client with **OIDC** as the protocol, **Web Application** as the application type, **Authorization Code** as the Grant Type 1. Please ensure **email** claim is returned as part of the **ID Token** 2. Set **Sign-in redirect URIs** to `https://{CONSOLE_URL}/api/auth/callback/custom`, and **Sign-out redirect URIs** to `https://{CONSOLE_URL}`, **Web origins** to `https://{CONSOLE_URL}` 3. Create a **Client Secret** 4. Share all these with Galileo: 1. CLIENT\_ID 2. CLIENT\_SECRET 3. TOKEN\_URL (like `https://{BASE_URL}/token`) 4. USERINFO\_URL (like `https://{BASE_URL}/userinfo`) 5. ISSUER 6. JWKS\_URL (like `https://{BASE_URL}/certs`) 7. AUTHORIZATION\_URL (like `https://{BASE_URL}/auth?response_type=code`) # Examples Explore Galileo's practical examples covering real-world use cases and workflows for Evaluate, Observe, and Protect modules across AI projects. export const Pill = ({label}) => {label} ; In this section, we will guide you through some code examples and provide links directly to the notebooks where you can easily complete the Galileo Evaluate runs end-to-end. ## Evaluate Run an evaluation over your *prompts*. Run an evaluation over a combination of model, params and prompt templates to prompt engineer your *prompts*. Evaluate and compare 3 RAG-based QA Chatbots with OpenAI
Evaluation of a RAG-based QA Chatbot built with Langchain and ChromaDB
Learn how to register a custom GPT scorer.
## Observe Monitor a RAG-based QA Chatbot with OpenAI
## Protect ## Finetune ## NLP Studio {" "} # What is Galileo? Evaluate, Observe, and Protect your GenAI applications Galileo is the leading Generative AI Evaluation & Observability Stack for the Enterprise.