On This Page
1Overview2Architecture Overview
3AWS Service Topology4Implementation Guide
5Decision Criteria6Cost Model
7Anti-Patterns to Avoid8References

Overview

When batch processing is too slow — financial transactions that must be detected as fraudulent within milliseconds, telemetry that must trigger auto-scaling within seconds, clickstreams that must personalise a page before the next request — streaming is the architecture. AWS provides two parallel tracks: the Kinesis family for AWS-native streaming and MSK for Kafka-protocol compatibility with existing tooling.

Architecture Overview

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'IBM Plex Sans, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([High-velocity Data Source]) START --> INGEST{Ingestion Layer} INGEST -->|AWS-native streaming| KDS[Kinesis Data Streams\nReal-time, millisecond latency\nShard-based throughput] INGEST -->|Kafka protocol needed| MSK[Amazon MSK\nManaged Apache Kafka\nExisting Kafka tooling] INGEST -->|Delivery to storage only| KDF[Kinesis Data Firehose\nZero administration\nDirect to S3 or Redshift] KDS --> PROCESS{Processing Layer} MSK --> PROCESS PROCESS -->|Real-time analytics| KDA[Kinesis Data Analytics\nApache Flink\nSQL on streaming data] PROCESS -->|Event processing| LAMBDA_S[Lambda\nTrigger per shard\nBatch processing] KDA & LAMBDA_S --> STORE{Storage Layer} KDF --> STORE STORE -->|Analytics and ML| S3_S[S3 Data Lake\nParquet format\nGlue catalogue] STORE -->|Search and dashboards| OS[OpenSearch Service\nKibana dashboards\nAnomaly detection] STORE -->|Low-latency queries| DDB_S[DynamoDB\nPre-aggregated state\nReal-time lookups] S3_S & OS & DDB_S --> DONE([Insights Available — Real Time and Historical]) style START fill:#4f8ef7,color:#fff style DONE fill:#10b981,color:#fff style KDA fill:#e0f2fe style S3_S fill:#e0f2fe

AWS Service Topology

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'IBM Plex Sans, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD SOURCES[Data Sources\nApplications, IoT, Clickstreams\nDatabase CDC via DMS] subgraph KINESIS["Kinesis Family"] KDS_T[Kinesis Data Streams\n1 shard = 1 MB/s in, 2 MB/s out\nRetention: 24h to 365 days] KDF_T[Kinesis Data Firehose\nBuffer: 1-15 minutes or 1-128 MB\nAutomatic format conversion] KDA_T[Kinesis Data Analytics\nApache Flink application\nWindowed aggregations] end subgraph MSK_BLOCK["Amazon MSK"] BROKER[Kafka Brokers\nManaged upgrades\nAutoscale storage] CONNECT[MSK Connect\nKafka Connect managed\nSource and sink connectors] SCHEMA_S[Glue Schema Registry\nAvro, JSON, Protobuf\nSchema evolution governance] end subgraph DESTINATIONS["Data Destinations"] S3_T[S3 Data Lake\nGlue Crawler then Data Catalogue\nAthena SQL queries] RS[Amazon Redshift\nFirehose direct load\nBI and reporting] OS_T[OpenSearch\nReal-time indexing\nKibana visualisation] end SOURCES --> KDS_T & KDF_T & BROKER KDS_T --> KDA_T KDS_T --> KDF_T KDA_T --> OS_T KDF_T --> S3_T & RS BROKER --> CONNECT CONNECT --> S3_T SCHEMA_S -.->|Schema enforcement| BROKER & KDS_T style SOURCES fill:#4f8ef7,color:#fff style SCHEMA_S fill:#e0f2fe

Implementation Guide

CDK Stack — Kinesis Data Stream with Lambda Consumer

import * as cdk from 'aws-cdk-lib';
import * as kinesis from 'aws-cdk-lib/aws-kinesis';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as lambdaEventSources from 'aws-cdk-lib/aws-lambda-event-sources';
import * as firehose from '@aws-cdk/aws-kinesisfirehose-alpha';
import * as destinations from '@aws-cdk/aws-kinesisfirehose-destinations-alpha';
import * as s3 from 'aws-cdk-lib/aws-s3';
import { Duration, RemovalPolicy } from 'aws-cdk-lib';

export class DataStreamingStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Kinesis stream — shard count formula:
    //   shards = max(ingest MB/s, consume MB/s / 2)
    const stream = new kinesis.Stream(this, 'AppStream', {
      shardCount: 2,         // 2 MB/s ingest, 4 MB/s consume
      retentionPeriod: Duration.hours(24),
      encryption: kinesis.StreamEncryption.KMS,
      streamMode: kinesis.StreamMode.PROVISIONED,
    });

    // Lambda consumer with enhanced fan-out
    const processor = new lambda.Function(this, 'StreamProcessor', {
      runtime: lambda.Runtime.NODEJS_20_X,
      architecture: lambda.Architecture.ARM_64,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('lambda/stream-processor'),
      timeout: Duration.minutes(5),
    });

    processor.addEventSource(
      new lambdaEventSources.KinesisEventSource(stream, {
        startingPosition: lambda.StartingPosition.TRIM_HORIZON,
        batchSize: 100,
        bisectBatchOnError: true,
        maxBatchingWindow: Duration.seconds(5),
        reportBatchItemFailures: true,
      }));

    // Firehose — archive all events to S3
    const archiveBucket = new s3.Bucket(this, 'ArchiveBucket', {
      versioned: true,
      removalPolicy: RemovalPolicy.RETAIN,
    });

    new firehose.DeliveryStream(this, 'ArchiveStream', {
      sourceStream: stream,
      destinations: [
        new destinations.S3Bucket(archiveBucket, {
          dataOutputPrefix: 'events/year=!{timestamp:yyyy}/'
            + 'month=!{timestamp:MM}/day=!{timestamp:dd}/',
          bufferingInterval: Duration.minutes(5),
          bufferingSize: cdk.Size.mebibytes(64),
        }),
      ],
    });
  }
}

Decision Criteria

Factor Kinesis Data Streams Amazon MSK Kinesis Firehose
Protocol AWS proprietary Apache Kafka AWS proprietary
Latency Sub-second Sub-second 60–900 seconds
Administration Minimal Medium None
Consumer groups Shard iterators Kafka consumer groups Not applicable
Replay 24h–365 days Configurable retention Not supported
Use when New AWS-native streaming Kafka migration or ecosystem Delivery to S3/Redshift only

Cost Model

Kinesis Data Streams costs $0.015 per shard hour plus $0.014 per million PUT payload units. Two shards running continuously costs approximately $22/month plus data charges. MSK costs are EC2-based — a three-broker cluster with kafka.m5.large instances costs approximately $200/month plus storage.

Anti-Patterns to Avoid

⚠ 1. Using Kinesis When SQS Is Sufficient

Adding a Kinesis stream for simple task queue use cases — sending emails, resizing images, processing orders one at a time. Kinesis is designed for high-throughput ordered streams, not task queues. The shard management, enhanced fan-out, and iterator tracking adds complexity for no benefit.

Hover to see the fix ↻
↺ Correct Approach

Use SQS for task queue patterns. Use Kinesis when you need ordered, high-throughput streaming with multiple independent consumers reading the same stream at different positions.

⚠ 2. Ignoring Shard Hot Spots

Distributing records across shards using a low-cardinality partition key such as event type or region. All records of type OrderCreated go to shard 1, which hits its 1 MB/s write limit and drops records under load while other shards sit idle.

Hover to see the fix ↻
↺ Correct Approach

Use a high-cardinality partition key — customer ID, order ID, or device ID. Distribute evenly across all shards. Monitor the MaxPutRecordsThrottled metric per shard.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'IBM Plex Sans, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([High-Velocity Data Event]) START --> CHOOSE{Select Streaming Service} CHOOSE -->|New AWS streaming, low ops| KDS[Kinesis Data Streams\nProvisioned or on-demand shards\n1 MB per second per shard] CHOOSE -->|Kafka ecosystem required| MSK[Amazon MSK\nManaged Kafka brokers\nExisting consumer groups] CHOOSE -->|Delivery to S3 or Redshift only| KDF[Kinesis Firehose\nZero administration\nBuffered delivery] KDS --> CONSUME{Consumer Type} CONSUME -->|Real-time analytics| KDA[Kinesis Analytics\nApache Flink\nWindowed SQL] CONSUME -->|Event processing| FN_S[Lambda Function\nBatch size 100\nBisect on error] MSK --> CONNECT_S[MSK Connect\nKafka Connect\nSink to S3 or OpenSearch] KDA --> STORE_S[OpenSearch\nDashboards\nAnomaly detection] FN_S --> DDB_S2[DynamoDB\nAggregated state\nReal-time queries] KDF --> S3_D[S3 Data Lake\nParquet via Glue\nAthena queryable] CONNECT_S --> S3_D S3_D & STORE_S & DDB_S2 --> DONE([Streaming Data Available for Insights]) style START fill:#4f8ef7,color:#fff style DONE fill:#10b981,color:#fff style KDA fill:#e0f2fe style CONNECT_S fill:#e0f2fe

References

  1. AWS — Amazon Kinesis Data Streams Developer Guide. docs.aws.amazon.com/streams
  2. AWS — Amazon MSK Developer Guide. docs.aws.amazon.com/msk
  3. AWS — Kinesis Data Firehose Developer Guide. docs.aws.amazon.com/firehose
  4. Apache Kafka — Documentation. kafka.apache.org/documentation
Ascendion Engineering Knowledge Base ← Cloud