On This Page
1Overview2Architecture Overview
3Azure Service Topology4Implementation Guide
5Decision Criteria6Cost Model
7Anti-Patterns to Avoid8References

Overview

When batch processing is too slow — IoT telemetry that must trigger device alerts within seconds, financial transactions that must be scored before a user sees a response, clickstreams that must personalise an experience before the next page load — streaming is the architecture. Azure offers a unified track: Event Hubs for ingestion with native Kafka-protocol surface, Stream Analytics for managed SQL-based windowed processing, and Data Lake Gen2 for durable storage that outlives retention windows. Teams migrating from on-premises Kafka can repoint brokers to port 9093 with no application changes.

Architecture Overview

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([High-velocity Data Source]) START --> INGEST{Ingestion Layer} INGEST -->|High-throughput events| EH[Event Hubs Premium\nKafka surface port 9093\n32 partitions, 7-day retention] INGEST -->|Raw archive required| CAP[Event Hubs Capture\nAvro to Data Lake\nRetention independent of hub] EH --> PROCESS{Processing Layer} PROCESS -->|Windowed analytics| ASA[Stream Analytics\nSAQL tumbling and hopping windows\nManaged scaling in SUs] PROCESS -->|Threshold alerting| FN_A[Azure Functions\nPer-event trigger\nAlert routing] CAP --> DL[Data Lake Gen2\nHierarchical namespace\nRaw and curated zones] ASA & FN_A --> STORE{Storage Layer} STORE -->|Document and state| COSMOS[Cosmos DB\nPre-aggregated results\nLow-latency reads] STORE -->|Async fan-out| SB[Service Bus\nTopic subscriptions\nDownstream consumers] DL --> ADF[Data Factory\nBatch ELT 90+ connectors\nOrchestrated pipelines] ADF --> SYNAPSE[Synapse Analytics\nSQL pools\nBI and reporting] COSMOS & SB & SYNAPSE --> DONE([Insights Available — Real Time and Historical]) style START fill:#4f8ef7,color:#fff style DONE fill:#10b981,color:#fff style ASA fill:#e0f2fe style DL fill:#e0f2fe

Azure Service Topology

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD SOURCES[Data Sources\nIoT devices, Applications\nMicroservices via Kafka SDK] subgraph EH_NS["Event Hubs Premium Namespace"] HUB[device-telemetry Hub\n32 partitions, 7-day retention\nKafka surface on port 9093] CAPTURE[Event Hubs Capture\nAvro format, 300s or 300 MB\nArchive to raw-telemetry container] CG1[Consumer Group\nstream-analytics] CG2[Consumer Group\ndata-lake-writer] CG3[Consumer Group\nreal-time-alerting] end subgraph PROCESS_BLOCK["Real-Time Processing"] ASA_T[Stream Analytics StandardV2\n5-min tumbling window AVG and MAX\n3 SUs, PARTITION BY deviceId] FN_T[Alert Function\nTemperature threshold trigger\nService Bus output] end subgraph DESTINATIONS["Data Destinations"] DL_T[Data Lake Gen2\nRaw and curated zones\nHierarchical namespace] COSMOS_T[Cosmos DB\nAggregated telemetry\nPer-device real-time state] SB_T[Service Bus\nAlert topic\nDownstream subscribers] end ADF_T[Data Factory\nBatch ELT pipelines\nData Lake to Synapse] SYNAPSE_T[Synapse Analytics\nSQL pools and Spark\nBI and historical queries] SOURCES --> HUB HUB --> CAPTURE HUB --> CG1 & CG2 & CG3 CG1 --> ASA_T CG3 --> FN_T CG2 --> DL_T CAPTURE --> DL_T ASA_T --> COSMOS_T FN_T --> SB_T DL_T --> ADF_T ADF_T --> SYNAPSE_T style SOURCES fill:#4f8ef7,color:#fff style CAPTURE fill:#e0f2fe style ASA_T fill:#e0f2fe

Implementation Guide

Bicep — Event Hubs Premium Namespace with Capture

param location string = resourceGroup().location
param namespaceName string = 'evhns-telemetry-prod'
param dataLakeAccountName string
param dataLakeContainerName string = 'raw-telemetry'

resource eventHubNamespace 'Microsoft.EventHub/namespaces@2022-10-01-preview' = {
  name: namespaceName
  location: location
  sku: {
    name: 'Premium'
    tier: 'Premium'
    capacity: 1
  }
  properties: {
    zoneRedundant: true
    disableLocalAuth: true
    minimumTlsVersion: '1.2'
  }
}

resource eventHub 'Microsoft.EventHub/namespaces/eventhubs@2022-10-01-preview' = {
  parent: eventHubNamespace
  name: 'device-telemetry'
  properties: {
    partitionCount: 32          // permanent — size >= 2x peak consumer throughput
    messageRetentionInDays: 7
    captureDescription: {
      enabled: true
      encoding: 'Avro'
      intervalInSeconds: 300
      sizeLimitInBytes: 314572800   // 300 MB
      destination: {
        name: 'EventHubArchive.AzureBlockBlob'
        properties: {
          storageAccountResourceId: resourceId(
            'Microsoft.Storage/storageAccounts', dataLakeAccountName)
          blobContainer: dataLakeContainerName
          archiveNameFormat: (
            '{Namespace}/{EventHub}/{PartitionId}/'
            + '{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}')
        }
      }
    }
  }
}

resource cgStreamAnalytics 'Microsoft.EventHub/namespaces/eventhubs/consumergroups@2022-10-01-preview' = {
  parent: eventHub
  name: 'stream-analytics'
}

resource cgDataLakeWriter 'Microsoft.EventHub/namespaces/eventhubs/consumergroups@2022-10-01-preview' = {
  parent: eventHub
  name: 'data-lake-writer'
}

resource cgRealTimeAlerting 'Microsoft.EventHub/namespaces/eventhubs/consumergroups@2022-10-01-preview' = {
  parent: eventHub
  name: 'real-time-alerting'
}

Terraform equivalent: Use azurerm_eventhub_namespace (sku = "Premium", zone_redundant = true, local_authentication_enabled = false), azurerm_eventhub with capture_description block (encoding = "Avro", interval_in_seconds = 300), and azurerm_eventhub_consumer_group for each of the three consumer groups.

Bicep — Stream Analytics Job with Windowed Query

resource streamAnalyticsJob 'Microsoft.StreamAnalytics/streamingjobs@2021-10-01-preview' = {
  name: 'asa-telemetry-prod'
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    sku: {
      name: 'StandardV2'
    }
    eventsLateArrivalMaxDelayInSeconds: 60
    eventsOutOfOrderPolicy: 'Adjust'
    outputErrorPolicy: 'Drop'
    transformation: {
      name: 'MainTransformation'
      properties: {
        streamingUnits: 3    // start here; load test and scale on SU% metric
        query: '''
          -- 5-minute tumbling window aggregation to Cosmos DB
          SELECT
              deviceId,
              System.Timestamp()          AS windowEnd,
              AVG(Temperature)            AS avgTemperature,
              MAX(Temperature)            AS maxTemperature,
              COUNT(*)                    AS eventCount
          INTO [cosmos-output]
          FROM [event-hubs-input] PARTITION BY deviceId
          TIMESTAMP BY EventEnqueuedUtcTime
          GROUP BY deviceId, TumblingWindow(minute, 5)

          -- Per-event alert threshold to Service Bus
          SELECT
              deviceId,
              Temperature,
              EventEnqueuedUtcTime        AS alertTime
          INTO [alerts-output]
          FROM [event-hubs-input]
          WHERE Temperature > 85
        '''
      }
    }
  }
}

Terraform equivalent: Use azurerm_stream_analytics_job with streaming_units = 3, events_late_arrival_max_delay_in_seconds = 60, events_out_of_order_policy = "Adjust". Define the query in the transformation_query field. Wire inputs and outputs via azurerm_stream_analytics_stream_input_eventhub and the appropriate output resources.

Schema Registry on Event Hubs Premium

Event Hubs Premium includes a Schema Registry. Register schemas once per namespace; producers and consumers reference by schema ID. Avro, JSON Schema, and Protobuf are supported. Schema evolution uses compatibility modes (BACKWARD, FORWARD, FULL) identical to Confluent Schema Registry semantics — existing Kafka Schema Registry client code requires only a URL change.

Decision Criteria

Factor Event Hubs Stream Analytics Data Factory
Latency Sub-second ingest Sub-second to minutes (window size) Minutes to hours
Protocol AMQP, Kafka (port 9093), HTTPS SQL-like SAQL REST, 90+ connectors
Administration Minimal (namespace + hub) None (fully managed) Low (pipeline authoring)
Replay Up to 90-day retention Not applicable — stateless processing Full re-run from source
Ordering Per-partition ordered Preserved within partition Not guaranteed
Use when High-throughput event ingestion, Kafka migration Windowed aggregations, real-time routing Batch ELT to Synapse, scheduled pipelines

Cost Model

Event Hubs Premium is billed per Processing Unit (PU) at approximately $0.928/PU/hour in East US. One PU handles roughly 1 MB/s ingest and 2 MB/s egress. A single-PU namespace running continuously costs approximately $667/month. Event Hubs Capture adds $0.10 per million events archived. Stream Analytics is billed per Streaming Unit (SU) at approximately $0.081/SU/hour — three SUs costs around $175/month. Data Factory charges per pipeline activity run and data movement volume; a moderate batch workload (1,000 runs/day) costs approximately $30–60/month.

Cost optimisation levers

  • Right-size PUs at namespace level, not event hub level — one PU supports multiple event hubs within the namespace.
  • Enable Event Hubs Capture only on hubs where durable storage is required; egress charges apply to captured bytes leaving the namespace.
  • Stream Analytics SU scaling is opaque to consumers — monitor the SU% utilisation metric; scale up when sustained SU% exceeds 80%.
  • Use Dedicated Event Hubs clusters (20 CUs minimum, ~$14,600/month) only when namespace isolation, BYOK, or >10 PUs are required.
  • Data Factory costs are dominated by activity runs — batch small payloads into fewer large runs rather than triggering per-event.

Anti-Patterns to Avoid

⚠ 1. Sharing a Consumer Group Across Multiple Applications

Two independent services (Stream Analytics and a custom processor) reading from the same consumer group. Only one active reader per consumer group is supported; the second reader will be disconnected or will receive inconsistent offsets, causing event loss or duplication.

Hover to see the fix ↻
↺ Correct Approach

Provision one consumer group per consuming application. Consumer groups are free on Event Hubs — create stream-analytics, data-lake-writer, and real-time-alerting as separate groups, each maintaining its own offset independently.

⚠ 2. Skipping Event Hubs Capture for Durable Storage

Treating the event hub's retention window (7 days) as the data archive. When the retention window expires, events are gone. Downstream reprocessing, audit, and backfill jobs have no source.

Hover to see the fix ↻
↺ Correct Approach

Enable Event Hubs Capture on every production hub. Capture writes Avro files to Data Lake Gen2 at a 300-second or 300 MB interval — whichever is reached first. Retention on the Data Lake is independent of the hub retention window and can be managed via lifecycle policies.

⚠ 3. Not Right-Sizing Partition Count at Creation

Creating an event hub with the default 4 partitions for a workload that grows to 20 concurrent consumers or 10 MB/s ingest. Partition count is permanent on Event Hubs — it cannot be increased after creation without recreating the hub and its consumer group offsets.

Hover to see the fix ↻
↺ Correct Approach

Size partitions at hub creation to at least twice the peak expected consumer parallelism. The rule is partitionCount ≥ 2 × peak consumer throughput in MB/s. For most production workloads, 32 partitions is the safe default.

⚠ 4. Running Production Stream Analytics on a Single Streaming Unit

A 1-SU job processes all data on a single thread. PARTITION BY clauses, which unlock true parallelism, require at least 6 SUs. Under sustained load, a 1-SU job will lag, buffer, and eventually drop late events.

Hover to see the fix ↻
↺ Correct Approach

Start production jobs at 3 SUs minimum. Enable PARTITION BY deviceId (or equivalent high-cardinality key) in the query and scale to 6+ SUs when SU% exceeds 80% under load testing. Use the Stream Analytics job diagram in the portal to identify bottleneck steps.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([High-Velocity Data Event]) START --> CHOOSE{Select Processing Path} CHOOSE -->|Real-time windowed analytics| EH_ASA[Event Hubs Premium\nKafka port 9093, 32 partitions\n7-day retention] CHOOSE -->|Durable raw archive| CAP_D[Event Hubs Capture\nAvro to Data Lake\n300s or 300 MB interval] EH_ASA --> CG{Consumer Groups} CG -->|stream-analytics group| ASA_D[Stream Analytics StandardV2\n5-min tumbling window\nAVG and MAX per device] CG -->|real-time-alerting group| FN_D[Alert Function\nTemperature > 85 trigger\nService Bus output] CG -->|data-lake-writer group| DL_D[Data Lake Gen2\nRaw and curated zones\nHierarchical namespace] CAP_D --> DL_D ASA_D --> COSMOS_D[Cosmos DB\nAggregated device state\nReal-time queries] FN_D --> SB_D[Service Bus\nAlert topic\nDownstream subscribers] DL_D --> ADF_D[Data Factory\nBatch ELT pipelines\nOrchestrated to Synapse] ADF_D --> SYNAPSE_D[Synapse Analytics\nSQL pools and Spark\nBI and historical queries] COSMOS_D & SB_D & SYNAPSE_D --> DONE([Streaming Data Available for Insights]) style START fill:#4f8ef7,color:#fff style DONE fill:#10b981,color:#fff style ASA_D fill:#e0f2fe style CAP_D fill:#e0f2fe

References

  1. Microsoft — Azure Event Hubs scalability and partition count. https://learn.microsoft.com/azure/event-hubs/event-hubs-scalability
  2. Microsoft — Event Hubs Capture overview. https://learn.microsoft.com/azure/event-hubs/event-hubs-capture-overview
  3. Microsoft — Azure Stream Analytics windowing functions. https://learn.microsoft.com/azure/stream-analytics/stream-analytics-window-functions
  4. Microsoft — Azure Event Hubs for Apache Kafka. https://learn.microsoft.com/azure/event-hubs/azure-event-hubs-kafka-overview
  5. Microsoft — Azure Stream Analytics scaling with Streaming Units. https://learn.microsoft.com/azure/stream-analytics/stream-analytics-streaming-unit-consumption
  6. Microsoft — Azure Data Lake Storage Gen2 hierarchical namespace. https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-namespace
  7. Portal: AWS data streaming comparison
Ascendion Engineering Knowledge Base ← Cloud