A cheats guide to serverless standards (AWS edition)

SG

Scott Griffiths / May 12, 2020

8 min read

serverless-lens

Tldr;#

Good practice / design conciderations when it comes to implementing FaaS and serverless solutions

Faas 101#

Function-as-a-Service, or FaaS, is an event-driven computing execution model that runs in stateless containers and those functions manage server-side logic and state through the use of services.

Serverless 101#

serverless-lens

What Are We Trying to Achieve with Standards#

To have a standardised approach to dev, test, deployment, operations & support for all FaaS applications and serverless services within the business

Approach#

  1. Review the current state of Lambdas being created within the business
  2. A look into the frameworks that were being used (if any)
  3. Socialise FaaS through trainings and a guild
  4. Base this around the AWS WAF and serverless lens whitepapers

The Ideal Result Being#

A collaboration and ownership culture where standards can encapsulate and define how we do:

  • Reporting / Logging
  • Alerting and notifications (pagerDuty, Slack etc)
  • All the tests (Load , Unit, Integration, E2E)
  • DevSecOps
  • The use of frameworks
  • Patterns and sample applications
  • When to use serverless over conventional practices

For the most part Based around the Serverless Manifesto and The 12 Factors guides

Items Covered#

  1. AWS well architected pillars
  2. General Design Principles
  3. Key AWS available services
  4. AWS service options (cheatsheet)
  5. AWS Serverless Lens good practice

serverless-lens

Aws (Waf) pillars - Summerised#

'Thinking cloud natively, provide a consistent approach and making sure that foundational areas are thought about which includes'

  1. Operational Excellence
    • Prepare
    • Operate
    • Evolve
  2. Security
    • Identity and access management
    • data/infra protection
    • Detective controls
    • Incident response
  3. Reliability
    • Foundations
    • Change Management
    • Failure management
  4. Performance Efficiency
    • Correct resource types
    • Review current/future resource types
    • Monitoring
    • Trade-offs
  5. Cost optimization
    • Cost effective resources
    • Match supply Vs demand
    • Expenditure awareness
    • Optimizing over time

General Design Principles#

The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud for serverless applications:

  • Speedy, simple, singular

    Functions are concise, short, single purpose and their environment may live up to their request lifecycle. Transactions are efficiently cost aware and thus faster executions are preferred

  • Think concurrent requests, not total requests

    Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency.

  • Share nothing

    Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage are not guaranteed. State can be manipulated within a state machine execution lifecycle, and persistent storage is preferred for highly durable requirements.

  • Assume no hardware affinity

    Underlying infrastructure may change. Leverage code or dependencies that are hardware-agnostic as CPU flags, for example, may not be available consistently.

  • Orchestrate your application with state machines, not functions

    Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows.

  • Use events to trigger transactions

    Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. This asynchronous event behavior is often consumer agnostic and drives just-in-time processing to ensure lean service design.

  • Design for failures and duplicates

    Operations triggered from requests/events must be idempotent as failures can occur and a given request/event can be delivered more than once. Include appropriate retries for downstream calls.

Key Aws Services#

serverless-lens

AreaServices available
ReliabilityAWS Marketplace, Trusted Advisor, CloudWatch Logs, CloudWatch, API Gateway, Lambda, X-Ray, Step Functions, Amazon SQS, and Amazon SNS
Performance EfficiencyDynamoDB Accelerator, API Gateway, Step Functions, NAT gateway, Amazon VPC, and Lambda
Operational ExcellenceAWS Systems Manager Parameter Store, AWS SAM, CloudWatch, AWS CodePipeline, AWS X-Ray, Lambda, and API Gateway

AWS services cheatsheet (Area, Service options, Role)

serverless-lens

Aws Serverless Lens#

Come up with Plans to Define and Implement

Metrics to Gather:#

  • Business Metrics
  • Customer Experience Metrics
  • System Metrics
  • Operational Metrics

Alarming#

  • AWS Lambda

    Duration, Errors, Throttling, and Concurrency Executions. For stream-based invocations, alert on IteratorAge For Asynchronous invocations, alert on DeadLetterErrors.

  • API Gateway

    Integration Latency, Latency, 5XXError

  • Application Load Balancer

    HTTP Code_ELB 5XX Count Rejected Connection Count, HTTPCode Target 5XX_Count, Unhealthy Host Count, Lambda internalError, Lambda User Error

  • AWS AppSync

    5XX and Latency

  • SQS

    Approximate Age Oldest Message

  • Kinesis Data Streams

    Read Provisioned Throughput Exceeded WriteProvisionedThroughputExceeded, GetRecords./teratorAge Milliseconds. PutRecord. Success, PutRecords. Success (ifusing Kinesis Producer Library) and GetRecords

  • SES

    Rejects, Bounces, Complaints, Rendering Failures

  • AWS Step Functions

    Execution Throttled. Execution Failed, Executions Timeout

  • EventBridge

    Failed Invocations, Toddler Rules Amazon S3: 5xx Errors, Total Request Latency

  • Amazon DynamoDB

    Read Throttle Events, Wite Throttle Events, System Errors Thailand

Logging (Centralised)#

CorrelationId, request identifiers etc (use correct levels Ie ‘Info’) serverless-lens

Tracing - Distributed#

Identify performance degradation and quickly understand anomalies, including latency distributions

Prototyping#

"Every Lambda Should to Be Treated as a Microservice for Testing"

The easiest way to test a serverless system maybe as a whole maybe to generate a separate system in a non-linked AWS account (or other cloud provider)

  • Unit | For the most part the same as non serverless
    • Creating modularized code as separate functions outside of the handler enables more unit-testable functions
    • Mock out the cloud services
  • Integration | Every Lambda should to be treated as a microservice for testing
    • Shouldn't use mocks and try and touch the active services/functions where possible
  • Performance | Use Lambda and 3rd party marketplace apps to load test
    • Steady and burst rates
    • Test async concurrency
  • End to End
    • The entire “distributed system” in a staging style environment with reasonable data

Deploying#

  1. Use a framework where possible (serverless/SAM)to model, prototype, build, package, and deploy. (parameterize the app and its dependencies)
  2. Use synthetic traffic, custom metrics, and alerts as part of a rollout deployment and the use of canaries
  3. Separate environments by account (Dev, UAT, PROD)

Security#


Identity and Access Management#

Dont
  • ❌ API Gateway API Keys is not a security mechanism and should not be used for authorization unless it’s a public API
  • ❌ Sharing more than one IAM role within a lambda function is not recommended
Do
  • ✅ Use Amazon API Gateway resource policies
  • ✅ Use least-privileged access and only allow the access needed to perform a given operation.

Infra Protection#

Favor dynamic authentication, such as temporary credentials with AWS IAM over static keys. API Gateway and AWS AppSync both support IAM Authorization

Data Protection#

  • Watch out when using std out. This may print sensitive info to Cloudwatch
  • Use a secrets manager that allows for rotation
  • Encrypt any sensitive data traversing your serverless application

Reliability#

  • Regulate Inbound request rates
  • Throttle at the API level based on access patterns (return 429)
  • Issue API keys to clients with usage plans
  • Look at concurrency controls where needed, and for single shard look at using Kinesis data streams

Cost Optimization#

(speed to Market Vs Cost)

  • pay-per-value pricing model
  • CPU, network, and storage IOPS is based on memory selection
  • Use key:value tags for the projects to track lambda billing

Failure Management#

  • Build in retry logic to your FaaS where possible
  • Use Step functions to minimise try/catch, back-off and retry logic
  • When consuming from Kineses and DynamoDb use Lambda error handling and DLQ on failure

Performance Efficiency#

  • Selection - perf test to know the right configuration and Edge / regional API decisions
  • Long running jobs should be passed to Step functions: (no additional cost incurred as this is billed per state transition and not tim spent in a state)
  • Use global variables to maintain connections to your data stores or other services and resources
  • Use of Athena to tune S3 queries (get only data required) OR S3 select

Training Options#

AWS WAF course

Architecting serverless solutions
Serverless framework

@ Discuss on Twitter