A cheats guide to serverless standards (AWS edition)

Scott Griffiths / May 12, 2020

8 min read •

serverless-lens

Tldr;#

Good practice / design conciderations when it comes to implementing FaaS and serverless solutions

Faas 101#

Function-as-a-Service, or FaaS, is an event-driven computing execution model that runs in stateless containers and those functions manage server-side logic and state through the use of services.

Serverless 101#

serverless-lens

What Are We Trying to Achieve with Standards#

To have a standardised approach to dev, test, deployment, operations & support for all FaaS applications and serverless services within the business

Approach#

Review the current state of Lambdas being created within the business
A look into the frameworks that were being used (if any)
Socialise FaaS through trainings and a guild
Base this around the AWS WAF and serverless lens whitepapers

The Ideal Result Being#

A collaboration and ownership culture where standards can encapsulate and define how we do:

Reporting / Logging
Alerting and notifications (pagerDuty, Slack etc)
All the tests (Load , Unit, Integration, E2E)
DevSecOps
The use of frameworks
Patterns and sample applications
When to use serverless over conventional practices

For the most part Based around the Serverless Manifesto and The 12 Factors guides

Items Covered#

AWS well architected pillars
General Design Principles
Key AWS available services
AWS service options (cheatsheet)
AWS Serverless Lens good practice

serverless-lens

Aws (Waf) pillars - Summerised#

'Thinking cloud natively, provide a consistent approach and making sure that foundational areas are thought about which includes'

Operational Excellence
- Prepare
- Operate
- Evolve
Security
- Identity and access management
- data/infra protection
- Detective controls
- Incident response
Reliability
- Foundations
- Change Management
- Failure management
Performance Efficiency
- Correct resource types
- Review current/future resource types
- Monitoring
- Trade-offs
Cost optimization
- Cost effective resources
- Match supply Vs demand
- Expenditure awareness
- Optimizing over time

General Design Principles#

The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud for serverless applications:

Speedy, simple, singular
Functions are concise, short, single purpose and their environment may live up to their request lifecycle. Transactions are efficiently cost aware and thus faster executions are preferred
Think concurrent requests, not total requests
Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency.
Share nothing
Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage are not guaranteed. State can be manipulated within a state machine execution lifecycle, and persistent storage is preferred for highly durable requirements.
Assume no hardware affinity
Underlying infrastructure may change. Leverage code or dependencies that are hardware-agnostic as CPU flags, for example, may not be available consistently.
Orchestrate your application with state machines, not functions
Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows.
Use events to trigger transactions
Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. This asynchronous event behavior is often consumer agnostic and drives just-in-time processing to ensure lean service design.
Design for failures and duplicates
Operations triggered from requests/events must be idempotent as failures can occur and a given request/event can be delivered more than once. Include appropriate retries for downstream calls.

Key Aws Services#

serverless-lens

Area	Services available
Reliability	AWS Marketplace, Trusted Advisor, CloudWatch Logs, CloudWatch, API Gateway, Lambda, X-Ray, Step Functions, Amazon SQS, and Amazon SNS
Performance Efficiency	DynamoDB Accelerator, API Gateway, Step Functions, NAT gateway, Amazon VPC, and Lambda
Operational Excellence	AWS Systems Manager Parameter Store, AWS SAM, CloudWatch, AWS CodePipeline, AWS X-Ray, Lambda, and API Gateway

AWS services cheatsheet (Area, Service options, Role)

Aws Serverless Lens#

Come up with Plans to Define and Implement

Metrics to Gather:#

Business Metrics
Customer Experience Metrics
System Metrics
Operational Metrics

Alarming#

AWS Lambda
Duration, Errors, Throttling, and Concurrency Executions. For stream-based invocations, alert on IteratorAge For Asynchronous invocations, alert on DeadLetterErrors.
API Gateway
Integration Latency, Latency, 5XXError
Application Load Balancer
HTTP Code_ELB 5XX Count Rejected Connection Count, HTTPCode Target 5XX_Count, Unhealthy Host Count, Lambda internalError, Lambda User Error
AWS AppSync
5XX and Latency
SQS
Approximate Age Oldest Message
Kinesis Data Streams
Read Provisioned Throughput Exceeded WriteProvisionedThroughputExceeded, GetRecords./teratorAge Milliseconds. PutRecord. Success, PutRecords. Success (ifusing Kinesis Producer Library) and GetRecords
SES
Rejects, Bounces, Complaints, Rendering Failures
AWS Step Functions
Execution Throttled. Execution Failed, Executions Timeout
EventBridge
Failed Invocations, Toddler Rules Amazon S3: 5xx Errors, Total Request Latency
Amazon DynamoDB
Read Throttle Events, Wite Throttle Events, System Errors Thailand

Logging (Centralised)#

CorrelationId, request identifiers etc (use correct levels Ie ‘Info’) serverless-lens

Tracing - Distributed#

Identify performance degradation and quickly understand anomalies, including latency distributions

Prototyping#

"Every Lambda Should to Be Treated as a Microservice for Testing"

The easiest way to test a serverless system maybe as a whole maybe to generate a separate system in a non-linked AWS account (or other cloud provider)

Unit | For the most part the same as non serverless
- Creating modularized code as separate functions outside of the handler enables more unit-testable functions
- Mock out the cloud services
Integration | Every Lambda should to be treated as a microservice for testing
- Shouldn't use mocks and try and touch the active services/functions where possible
Performance | Use Lambda and 3rd party marketplace apps to load test
- Steady and burst rates
- Test async concurrency
End to End
- The entire “distributed system” in a staging style environment with reasonable data

Deploying#

Use a framework where possible (serverless/SAM)to model, prototype, build, package, and deploy. (parameterize the app and its dependencies)
Use synthetic traffic, custom metrics, and alerts as part of a rollout deployment and the use of canaries
Separate environments by account (Dev, UAT, PROD)

Security#

Identity and Access Management#

Dont

❌ API Gateway API Keys is not a security mechanism and should not be used for authorization unless it’s a public API
❌ Sharing more than one IAM role within a lambda function is not recommended

Do

✅ Use Amazon API Gateway resource policies
✅ Use least-privileged access and only allow the access needed to perform a given operation.

Infra Protection#

Favor dynamic authentication, such as temporary credentials with AWS IAM over static keys. API Gateway and AWS AppSync both support IAM Authorization

Data Protection#

Watch out when using std out. This may print sensitive info to Cloudwatch
Use a secrets manager that allows for rotation
Encrypt any sensitive data traversing your serverless application

Reliability#

Regulate Inbound request rates
Throttle at the API level based on access patterns (return 429)
Issue API keys to clients with usage plans
Look at concurrency controls where needed, and for single shard look at using Kinesis data streams

Cost Optimization#

(speed to Market Vs Cost)

pay-per-value pricing model
CPU, network, and storage IOPS is based on memory selection
Use key:value tags for the projects to track lambda billing

Failure Management#

Build in retry logic to your FaaS where possible
Use Step functions to minimise try/catch, back-off and retry logic
When consuming from Kineses and DynamoDb use Lambda error handling and DLQ on failure

Performance Efficiency#

Selection - perf test to know the right configuration and Edge / regional API decisions
Long running jobs should be passed to Step functions: (no additional cost incurred as this is billed per state transition and not tim spent in a state)
Use global variables to maintain connections to your data stores or other services and resources
Use of Athena to tune S3 queries (get only data required) OR S3 select

Training Options#

AWS WAF course

Architecting serverless solutions

Serverless framework

@ Discuss on Twitter

A cheats guide to serverless standards (AWS edition)

.css-12m0k8p{pointer-events:auto;}Tldr;.css-16wuya{color:#3182ce;font-weight:400;outline:none;opacity:0;margin-left:0.375rem;}.css-16wuya:focus{opacity:1;box-shadow:0 0 0 3px rgba(66,153,225,0.6);}#

Faas 101#

Serverless 101#

What Are We Trying to Achieve with Standards#

Approach#

The Ideal Result Being#

Items Covered#

Aws (Waf) pillars - Summerised#

General Design Principles#

Key Aws Services#

Aws Serverless Lens#

Come up with Plans to Define and Implement

Metrics to Gather:#

Alarming#

Logging (Centralised)#

Tracing - Distributed#

Prototyping#

"Every Lambda Should to Be Treated as a Microservice for Testing"

Deploying#

Security#

Identity and Access Management#

Dont

Do

Infra Protection#

Data Protection#

Reliability#

Cost Optimization#

(speed to Market Vs Cost)

Failure Management#

Performance Efficiency#

Training Options#

Tldr;#