Sanganak Authority

In previous video we saw step by step process on integrating Gemini AI in Azure API Management. In this video we will see how Gemini AI can be rate limited in terms of token consumption using api management. This way it can help us to limit AI cost spikes also.

1. Common limit per client

example 200 TPM per client, applicable for all clients

2. Separate limit per client

example, Client1 – 200TPM, Client2- 500TPM etc.

Part 1 – Integrate Gemini AI in API Management –

https://youtu.be/HNuOF09vq_I

Rate limit Gemini AI using API Management Code Base

https://github.com/kunalchandratre1/azure-gen-ai-gateway

#AzureBeyondDemos #AzureAPIManagement #GeminiAI #GenAIGateway #AzureIntegration #GoogleGemini #AzureBeyondDemos #APIM #GenerativeAI #AzureTutorial #CloudArchitecture #AIIntegration #AzureForAI #GeminiOnAzure #APIMGateway #AzureAI #Microsoft #MicrosoftAzure #Msftadvocate

Make sure that you copy paste below code in a file with extension as .yaml.
#google #gemini #openapi


openapi: 3.0.3
info:
  title: Gemini AI API
  description: |
    The Gemini API allows developers to build generative AI applications using Gemini models.
    Gemini is multimodal and can understand text, images, audio, video, and code.
  version: v1beta
  contact:
    name: Google AI
    url: https://ai.google.dev
  license:
    name: Apache 2.0
    url: https://www.apache.org/licenses/LICENSE-2.0

servers:
  - url: https://generativelanguage.googleapis.com/v1beta
    description: Gemini API Production Server

security:
  - ApiKeyQuery: []
  - ApiKeyHeader: []

tags:
  - name: Models
    description: Operations for listing and retrieving model information
  - name: Content Generation
    description: Generate content using Gemini models
  - name: Embeddings
    description: Generate text embeddings
  - name: Token Counting
    description: Count tokens in prompts

paths:
  /models:
    get:
      tags: [Models]
      operationId: listModels
      summary: List available models
      description: Lists the Gemini models available through the API
      parameters:
        - $ref: '#/components/parameters/PageSize'
        - $ref: '#/components/parameters/PageToken'
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ListModelsResponse'

  /models/{model}:
    get:
      tags: [Models]
      operationId: getModel
      summary: Get model information
      description: Gets information about a specific model
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
          example: gemini-1.5-pro
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Model'

  /models/{model}:generateContent:
    post:
      tags: [Content Generation]
      operationId: generateContent
      summary: Generate content
      description: Generates a model response given an input GenerateContentRequest
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
          description: 'Model name (e.g., models/gemini-1.5-pro)'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateContentRequest'
            examples:
              simpleText:
                summary: Simple text generation
                value:
                  contents:
                    - role: user
                      parts:
                        - text: "Explain quantum computing in simple terms"
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/GenerateContentResponse'

  /models/{model}:streamGenerateContent:
    post:
      tags: [Content Generation]
      operationId: streamGenerateContent
      summary: Stream generate content
      description: Generates a streamed response from the model
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/GenerateContentRequest'
      responses:
        '200':
          description: Successful response (server-sent events)
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/GenerateContentResponse'

  /models/{model}:countTokens:
    post:
      tags: [Token Counting]
      operationId: countTokens
      summary: Count tokens
      description: Counts the number of tokens in a prompt
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CountTokensRequest'
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CountTokensResponse'

  /models/{model}:embedContent:
    post:
      tags: [Embeddings]
      operationId: embedContent
      summary: Generate embedding
      description: Generates a text embedding vector from the input content
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
          description: 'Model name (e.g., models/text-embedding-004)'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/EmbedContentRequest'
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/EmbedContentResponse'

  /models/{model}:batchEmbedContents:
    post:
      tags: [Embeddings]
      operationId: batchEmbedContents
      summary: Batch generate embeddings
      description: Generates multiple embedding vectors from a batch of inputs
      parameters:
        - name: model
          in: path
          required: true
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/BatchEmbedContentsRequest'
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BatchEmbedContentsResponse'

components:
  securitySchemes:
    ApiKeyQuery:
      type: apiKey
      in: query
      name: key
      description: API key for authentication
    ApiKeyHeader:
      type: apiKey
      in: header
      name: x-goog-api-key
      description: API key for authentication (preferred)

  parameters:
    PageSize:
      name: pageSize
      in: query
      schema:
        type: integer
        maximum: 1000
      description: Maximum number of results to return
    
    PageToken:
      name: pageToken
      in: query
      schema:
        type: string
      description: Token for pagination

  schemas:
    Model:
      type: object
      properties:
        name:
          type: string
          description: 'Model resource name (e.g., models/gemini-1.5-pro)'
        displayName:
          type: string
        description:
          type: string
        inputTokenLimit:
          type: integer
          format: int32
        outputTokenLimit:
          type: integer
          format: int32
        supportedGenerationMethods:
          type: array
          items:
            type: string
          example: ["generateContent", "countTokens"]

    ListModelsResponse:
      type: object
      properties:
        models:
          type: array
          items:
            $ref: '#/components/schemas/Model'
        nextPageToken:
          type: string

    GenerateContentRequest:
      type: object
      required:
        - contents
      properties:
        contents:
          type: array
          items:
            $ref: '#/components/schemas/Content'
          description: The content of the conversation with the model
        systemInstruction:
          $ref: '#/components/schemas/Content'
          description: Developer set system instructions
        generationConfig:
          $ref: '#/components/schemas/GenerationConfig'
        safetySettings:
          type: array
          items:
            $ref: '#/components/schemas/SafetySetting'

    GenerateContentResponse:
      type: object
      properties:
        candidates:
          type: array
          items:
            $ref: '#/components/schemas/Candidate'
        usageMetadata:
          $ref: '#/components/schemas/UsageMetadata'

    Content:
      type: object
      properties:
        role:
          type: string
          enum: [user, model]
          description: The producer of the content
        parts:
          type: array
          items:
            $ref: '#/components/schemas/Part'

    Part:
      type: object
      description: A part of multi-part content
      properties:
        text:
          type: string
          description: Inline text content
        inlineData:
          $ref: '#/components/schemas/Blob'
        fileData:
          $ref: '#/components/schemas/FileData'

    Blob:
      type: object
      required:
        - mimeType
        - data
      properties:
        mimeType:
          type: string
          example: "image/jpeg"
        data:
          type: string
          format: byte
          description: Base64 encoded data

    FileData:
      type: object
      required:
        - fileUri
      properties:
        mimeType:
          type: string
        fileUri:
          type: string
          description: URI of the file

    Candidate:
      type: object
      properties:
        content:
          $ref: '#/components/schemas/Content'
        finishReason:
          type: string
          enum: [STOP, MAX_TOKENS, SAFETY, RECITATION, OTHER]
        safetyRatings:
          type: array
          items:
            $ref: '#/components/schemas/SafetyRating'

    GenerationConfig:
      type: object
      properties:
        temperature:
          type: number
          format: float
          minimum: 0
          maximum: 2
          description: Controls randomness
        topP:
          type: number
          format: float
        topK:
          type: integer
          format: int32
        maxOutputTokens:
          type: integer
          format: int32
        stopSequences:
          type: array
          items:
            type: string

    SafetySetting:
      type: object
      required:
        - category
        - threshold
      properties:
        category:
          type: string
          enum:
            - HARM_CATEGORY_HARASSMENT
            - HARM_CATEGORY_HATE_SPEECH
            - HARM_CATEGORY_SEXUALLY_EXPLICIT
            - HARM_CATEGORY_DANGEROUS_CONTENT
        threshold:
          type: string
          enum:
            - BLOCK_NONE
            - BLOCK_ONLY_HIGH
            - BLOCK_MEDIUM_AND_ABOVE
            - BLOCK_LOW_AND_ABOVE

    SafetyRating:
      type: object
      properties:
        category:
          type: string
        probability:
          type: string
          enum: [NEGLIGIBLE, LOW, MEDIUM, HIGH]

    UsageMetadata:
      type: object
      properties:
        promptTokenCount:
          type: integer
        candidatesTokenCount:
          type: integer
        totalTokenCount:
          type: integer

    CountTokensRequest:
      type: object
      properties:
        contents:
          type: array
          items:
            $ref: '#/components/schemas/Content'

    CountTokensResponse:
      type: object
      properties:
        totalTokens:
          type: integer

    EmbedContentRequest:
      type: object
      required:
        - content
      properties:
        content:
          $ref: '#/components/schemas/Content'
        taskType:
          type: string
          enum:
            - RETRIEVAL_QUERY
            - RETRIEVAL_DOCUMENT
            - SEMANTIC_SIMILARITY
            - CLASSIFICATION
            - CLUSTERING

    EmbedContentResponse:
      type: object
      properties:
        embedding:
          $ref: '#/components/schemas/ContentEmbedding'

    ContentEmbedding:
      type: object
      properties:
        values:
          type: array
          items:
            type: number
            format: float

    BatchEmbedContentsRequest:
      type: object
      required:
        - requests
      properties:
        requests:
          type: array
          items:
            $ref: '#/components/schemas/EmbedContentRequest'

    BatchEmbedContentsResponse:
      type: object
      properties:
        embeddings:
          type: array
          items:
            $ref: '#/components/schemas/ContentEmbedding'

Sanganak Authority

Pages

Monday, December 15, 2025

Avoid Gemini AI Cost Spikes! Apply Rate Limits to Gemini AI via Azure API Management

Wednesday, November 26, 2025

Integrate Google Gemini AI with Azure API Management | Multi Cloud Gen AI Gateway

Thursday, November 20, 2025

Google Gemini AI - OpenAPI Specification File

Wednesday, October 22, 2025

Azure App Service vs Azure Kubernetes (AKS) | The Practical Comparison

Pageviews last month

Former Microsoft Azure MVP

Contact Form