Home
Understanding data flows cloud services
Back to Blog
CloudAPIsData Engineering

Understanding Data Flows Between APIs, Databases, and Cloud Services

December 20257 min read

One of the most valuable skills I've developed working at Gorkhali Agents is understanding how data moves through modern applications. This knowledge is essential for debugging issues and building reliable systems.

The Typical Data Flow

In most cloud-based applications, data follows a predictable path:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Client    │────▶│     API     │────▶│  Database   │
│  (Request)  │     │  (Process)  │     │  (Storage)  │
└─────────────┘     └─────────────┘     └─────────────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   Cloud     │
                    │  Services   │
                    │ (Pub/Sub,   │
                    │  Storage)   │
                    └─────────────┘

Understanding this flow helps when debugging. If data isn't appearing where expected, I check each step: Was the request received? Did the API process it correctly? Was it stored in the database?

Working with Google Cloud Services

At work, I use several GCP services that handle data:

  • Cloud Run: Hosts our backend services that process API requests
  • Cloud Functions: Handles event-driven processing and webhooks
  • Pub/Sub: Manages message queues for async processing
  • BigQuery: Stores analytical data for reporting
  • Firestore: NoSQL database for real-time data

Ensuring Data Integrity

One of my responsibilities is ensuring data is stored and retrieved correctly. This involves several practices:

1. Validating at API Boundaries

# FastAPI example with Pydantic validation
from pydantic import BaseModel, EmailStr, validator
from typing import Optional

class UserCreate(BaseModel):
    email: EmailStr
    name: str
    age: Optional[int] = None
    
    @validator('name')
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError('Name cannot be empty')
        return v.strip()
    
    @validator('age')
    def age_must_be_positive(cls, v):
        if v is not None and v < 0:
            raise ValueError('Age must be positive')
        return v

2. Database Constraints

Application validation isn't enough. Database constraints provide a safety net:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Ensure email format
    CONSTRAINT valid_email CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
);

CREATE INDEX idx_users_email ON users(email);

3. Handling Failures Gracefully

import asyncio
from typing import Optional

async def fetch_with_retry(
    url: str, 
    max_retries: int = 3,
    backoff_factor: float = 2.0
) -> Optional[dict]:
    """Fetch data with exponential backoff retry."""
    
    for attempt in range(max_retries):
        try:
            response = await http_client.get(url)
            response.raise_for_status()
            return response.json()
            
        except Exception as e:
            if attempt == max_retries - 1:
                log.error(f"Failed after {max_retries} attempts: {e}")
                return None
            
            wait_time = backoff_factor ** attempt
            log.warning(f"Attempt {attempt + 1} failed, retrying in {wait_time}s")
            await asyncio.sleep(wait_time)
    
    return None

Debugging Production Issues

When something goes wrong with data, I follow a systematic approach:

  1. Check the logs: Cloud Logging shows what happened at each step
  2. Trace the request: Follow the data from client to database
  3. Verify the schema: Ensure database schema matches application expectations
  4. Check connectivity: VPC settings and IAM permissions can block access
  5. Review recent changes: Check deployment history for related changes

Documentation and Knowledge Sharing

A significant part of my work involves documenting workflows and data-related configurations. Good documentation helps the team:

  • Onboard new team members faster
  • Troubleshoot issues without depending on one person
  • Maintain consistent practices across the team
  • Reduce time spent on repeated questions

Looking Forward: Data Engineering

Understanding these data flows is directly relevant to data engineering. ETL pipelines are essentially:

  • Extract: Pull data from various sources (APIs, databases, files)
  • Transform: Clean, validate, and reshape the data
  • Load: Store in a data warehouse for analysis

My current work with PostgreSQL, Python, and cloud services gives me hands-on experience with each of these stages. Learning tools like BigQuery and understanding data flows prepares me for building and maintaining data pipelines.