Websocket requires a long-running process on the server, but serverless functions are short-lived. This makes serverless architecture not fit for handling WebSocket connections. That said, there are workarounds for building a scaleable WebSocket server. We will set up a Serverless infrastructure for WebSockets. It will use Python, AWS API Gateway, AWS Lambda & Dynamodb using AWS-CDK. Let’s get started!

The problem – WebSocket and Serverless

A WebSocket is a protocol for a persistent connection between a client and server, that allows for bi-directional exchange of data. It is generally fast compared to HTTP as it requires only one request for a handshake and an initial set of headers to establish the connection.

After establishing the connection, the server keeps the connection alive to receive and send messages. The server stores a reference of the connection either in memory or in a database. This allows the WebSocket server to send messages back to the client(s). To keep the connection alive and store references to all the connected clients WebSocket servers need a lot of memory. Memory and connection limits often become a bottleneck when too many clients get connected.

In a Serverless architecture, there are no long-lived servers. Cloud providers create and destroy virtual servers on-demand, and may only last a few minutes. This means their memory available to the WebSocket server is very short-lived and destroyed along with the server.

So how can we manage a persistent WebSocket connection in a Serverless environment that clears its memory every few minutes?

Building Serverless Websocket Infrastructure with AWS

Out of the many different resources provided by AWS to create a Serverless WebSocket infrastructure, we’ll use following services:

  • Amazon API Gateway
  • Amazon Lambda Functions (Python)
  • Amazon DynamoDB
  • AWS CDK (Python SDK)

Amazon API Gateway is a public-facing managed service that acts as a “front door” for applications. API Gateway provides functionality for both RESTful APIs and WebSocket APIs. The key benefits of using API Gateway for WebSockets are:

  • it maintains the persistent connection for your application
  • manages the message transfer between your backend services and clients.

Architecture

The following setup uses AWS CDK in Python to declare the infrastructure as code.

Overview of Websocket Serverless Infrasturcture in AWS
Overview of Websocket Serverless Infrasturcture in AWS

1. Create the Stack

In CDK, a stack is an encapsulation of infrastructure resources. Resources are anything from EC2 Instances, Dynamo tables, to IAM roles and permissions. By creating a new instance of the following WebSocketStack, different environments can be synthesized and deployed.

from aws_cdk import App, Environment, Stack, Stage

ENV = Environment(
    account="99999999999"
    region="us-east-1"
)

app = App()

class WebSocketStack(Stage):
    def __init__(self, scope, id_, env, **kwargs):
        super().__init__(scope, id_, **kwargs)

        # Stack infrastructure code will go here.


DevStack = WebSocketStack(app, 'dev', ENV)
StageStack = WebSocketStack(app, 'stage', ENV)
ProdStack = WebSocketStack(app, 'prod', ENV)

app.synth()

AWS CDK’s synth command will read the WebSocketStack code and create a JSON output. The JSON output is used by AWS’s CloudFormation to create all the resources described in the code.

2. Create DynamoDB Connection Table

Each service of AWS has its own namespaced module within the AWS CDK library. Here we are importing aws_dynamodb to describe the Connections table.

We will use this table to store all the active WebSocket connections. API Gateway provides a connection_id for each of the connected WebSocket Client.

from aws_cdk import aws_dynamodb as dynamodb

...

connections_table = dynamodb.Table(
    self,
    "Connections",
    partition_key=dynamodb.Attribute(
        name="connection_id",
        type=dynamodb.AttributeType.STRING
    )
)

3. Define Lambda Function

Next, we define the Lambda function which will run all the WebSocket logic. Handling incoming messages and sending messages back to clients.

from aws_cdk import Duration
from aws_cdk import aws_lambda
from aws_cdk.aws_lambda_python_alpha import PythonFunction

...

websocket_function = PythonFunction(
    self,
    "WebsocketFunction",
    entry="../path/to/code/"
    index="websocket.py"
    runtime=aws_lambda.Runtime.PYTHON_3_9,
    handler="handler",
    architecture=aws_lambda.Architecture.X86,
    memory_size=512,
    timeout=Duration.seconds(30),
    environment={
        "CONNECTIONS_TABLE": connections_table.table_name
    }
)

Also, grant permission for this Lambda to read and write the WebSocket connections to the Dynamo table.

connections_table.grant_read_write_data(websocket_function)

4. Managing the Connection ID

Now that we have the Websocket function declared, we can create the handler to receive the incoming requests.

# lambda_handler.py

import boto3

dynamodb = boto3.client("dynamodb")

connections_table = os.getenv("CONNECTIONS_TABLE") # env var in Lambda function above.

def handler(event, context):
    # event and context are provided from AWS Lambda invocations.
    status_code = 200
    route = event["requestContext"]["routeKey"] # $connect, $disconnect or custom route key.
    connection_id = event["requestContext"]["connectionId"] # Websocket connection ID.

    if route == "$connect":
        connect_device(connection_id)
    elif route == "$disconnect":
        disconnect_device(connection_id)
    elif route == "chat":
        # chat message implementation.
    else:
        # Unknown route key
        status_code = 400

    return {"statusCode": status_code}


def connect_device(connection_id):
    dynamodb.put_item(
        TableName=connections_table,
        Item={"connection_id": {"S": connection_id}}
    )


def disconnect_device(connection_id):
    dynamodb.delete_item(
        TableName=connections_table,
        Key={"connection_id": {"S": connection_id}}
    )

All incoming WebSocket messages will be passed through this handler.

5. Authorization

If you need to check for authorization before a client can establish a WebSocket connection to the server use Lambda authorizer. It is a feature of Amazon API Gateway, that uses a Lambda function to control access to WebSockets requests. The API Gateway will trigger this function every time a new client tries to establish a connection.

from aws_cdk import Duration
from aws_cdk import aws_lambda
from aws_cdk.aws_lambda_python_alpha import PythonFunction
from aws_cdk.aws_apigatewayv2_authorizers_alpha import WebSocketLambdaAuthorizer

...

authorizer_function = PythonFunction(
    self,
    "AuthorizerFunction",
    entry="../path/to/your/code/"
    index="authorizer.py"
    runtime=aws_lambda.Runtime.PYTHON_3_9,
    handler="handler",
    architecture=aws_lambda.Architecture.X86,
    memory_size=512,
    timeout=Duration.seconds(30),
    environment={
        "CUSTOM_ENV_VAR": ""
    }
)

authorizer = WebSocketLambdaAuthorizer(
    "WebSocketAuthorizer",
    authorizer_function,
    identity_source=[
        "route.request.header.Authorization"
    ]
)

The identity_source above is an Authorization Token method that is passed along with the WebSocket request headers.

Authorization: Bearer <token>

6. WebSocket Routing

Now that we have the WebSocket and Authorization Lambdas defined, the next step is to connect the WebSocket Routes to the Lambda Functions.

The API Gateway has a WebSocketApi which can handle connections and disconnections of the client. Upon receiving these specific messages, also referred to as Routes with Route Keys of $connect and $disconnect, these actions are passed to the WebSocket Function handler.

The Authorizer Function only needs to be attached to the $connect Route as it will be the same connection for all other messages.

from aws_cdk import aws_apigatewayv2_alpha as apigwv2alpha
from aws_cdk.aws_apigatewayv2_integrations_alpha import WebSocketLambdaIntegration

...

websocket_api = apigwv2alpha.WebSocketApi(
    self,
    "WebsocketAPI",
    connect_route_options=apigwv2alpha.WebSocketRouteOptions(
        integration=WebSocketLambdaIntegration(
            "ConnectIntegration", websocket_function
        ),
        authorizer=authorizer
    ),
    disconnect_route_options=apigwv2alpha.WebSocketRouteOptions(
        integration=WebSocketLambdaIntegration(
            "DisconnectIntegration", websocket_function
        ),
    )
)

To add in more custom routes, add a new Route with a Route Key and the WebSocket Function that will handle this action.

websocket_api.add_route(
    "chat",
    integration=WebSocketLambdaIntegration(
        "ChatIntegration", websocket_function
    )
)

Finally, create a WebSocketStage to expose the API to the public.

websocket_stage = apigwv2alpha.WebSocketStage(
    self,
    "WebsocketStage",
    web_socket_api=websocket_api,
    stage_name="dev", # stage/prod
    auto_deploy=True
)

A WebSocket URL will be generated and is used to connect to once the WebSocketStage has been created. We can store this as an environment variable to be used in the Lambda functions.

websocket_function.add_environment("API_GATEWAY_URL", websocket_stage.url)

Send Messages from Lambda

To send a message back to a client connection, the Lambda handler needs to know two things:

  1. the API Gateway HTTP URL
  2. the client’s Connection ID, which we stored in the Dynamo table.

Using the boto3 library, we can send a message to the API Gateway which will pass the message along to the client.

import boto3

api_gateway_conn = boto3.client(
    "apigatewaymanagementapi",
    endpoint_url=API_GATEWAY_URL, # provided by the WebSocketStage stack.
    region_name=ENV.region
)

# query to validate if the websocket connection exists
if api_gateway_conn.get_connection(ConnectionId=connection_id):

    # Send a message to the WebSocket connection through API Gateway.
    api_gateway_conn.post_to_connection(
        ConnectionId=connection_id,
        Data=json.dumps({
            "message": "data"
        })
    )

Summary

This infrastructure can now be launched as a base for scalable WebSocket connections.

References

Discuss on Twitter Discuss on Hacker News

Related articles: