This article explains how SGS Digicomply utilize Generative Artificial Intelligence (AI) from third-party providers in our AI Copilot to enhance your research and insights while prioritizing your data privacy and security.
At SGS Digicomply, we are committed to transparently explaining how we utilize Generative Artificial Intelligence (AI) in our AI Copilot to enhance your research and insights while prioritizing your data privacy and security. This document outlines our current and potential future AI applications and the measures we take to protect your information, including our use of a private, in-house Large Language Model (LLM) for private data and third-party providers for public data.
As AI technology advances, new applications may emerge that could further enhance our product. Regardless of how AI technology evolves, we are rigid in our commitment to:
- Always prioritize the privacy and security of your data.
- Utilize our private, in-house LLM for all private customer data, ensuring it remains within the SGS Digicomply cloud.
- Utilize third-party providers exclusively for public data, solely for the purpose of fulfilling your requests related to that data.
- Never transmit private customer data to third-party providers, nor in a manner that could identify our clients.
- Never train our models, nor permit third-party providers to train their models, on customer data.
- Inform you of changes in our use of third-party providers, if applicable.
- Maintain the option for you to opt out of these features.
How We Use Customer Data
At SGS Digicomply, we categorize data as either 'private' or 'public' to ensure the highest level of privacy for your sensitive information.
- Private Data: All data generated directly by the user, including every query, comment, and document uploaded to the system, is considered private. For all private data, we utilize our own instance of an open-source Large Language Model (LLM) hosted securely within the SGS Digicomply cloud. This ensures that your private data never leaves our controlled environment and is never sent to third-party providers. Our private LLM processes your inputs (prompts), documents, labels, signals, comments, and notes to provide accurate, relevant, and contextual responses while maintaining complete data privacy.
- Public Data: For publicly available information or data explicitly designated as public, we may utilize trusted third-party LLM providers. This allows us to leverage external capabilities for broader data processing when privacy concerns are mitigated by the public nature of the data.
SGS Digicomply may store your inputs and outputs to reduce latency, such as when displaying a post summary, or when required to provide a feature, such as signals within the insights module. While third-party providers (for public data) may have their own data retention policies, we have deactivated any use of these data beyond the fulfillment of your request.
Regardless of the LLM used, customer inputs and outputs are used exclusively to serve and improve individual customer experiences and are not used for model training across customers.
Which Third-Party Providers Do We Use?
For private data (as defined above), we exclusively use our own secure, in-house LLM hosted within the SGS Digicomply cloud. For public data, the LLM providers we utilize do not use your inputs and outputs to improve their services. The list of third-party providers we use includes:
- Google Cloud Generative AI
- LLM models: Used in various parts of our applications, e.g., to generate summaries and answers.
- LLM Custom fine-tuned Gemini models: Employed for large-scale extraction tasks. These models are trained using our proprietary training datasets created by SGS experts. We never use client data to train models; however, these models can be used for inference on your data when you enable AI features for them.
- OpenAI
- LLM models: Used in various parts of our applications, e.g., to generate summaries and answers.
- Embeddings: Numerical representations of text that facilitate features based on similarity and relevancy.
- Cohere
- Embeddings: Numerical representations of text that facilitate features based on similarity and relevancy.
- Reranking: Used to order information by relevancy based on user input.