A cheats guide to serverless standards (AWS edition)

TLDR;
Good practice / design conciderations when it comes to implementing FaaS and serverless solutions
FaaS 101
Function-as-a-Service, or FaaS, is an event-driven computing execution model that runs in stateless containers and those functions manage server-side logic and state through the use of services.
Serverless 101

What are we trying to achieve with standards
To have a standardised approach to dev, test, deployment, operations & support for all FaaS applications and serverless services within the business
Approach
- Review the current state of Lambdas being created within the business
- A look into the frameworks that were being used (if any)
- Socialise FaaS through trainings and a guild
- Base this around the AWS WAF and serverless lens whitepapers
The ideal result being
A collaboration and ownership culture where standards can encapsulate and define how we do:
- Reporting / Logging
- Alerting and notifications (pagerDuty, Slack etc)
- All the tests (Load , Unit, Integration, E2E)
- DevSecOps
- The use of frameworks
- Patterns and sample applications
- When to use serverless over conventional practices
For the most part Based around the Serverless Manifesto and The 12 Factors guides
Items covered
- AWS well architected pillars
- General Design Principles
- Key AWS available services
- AWS service options (cheatsheet)
- AWS Serverless Lens good practice

AWS (WAF) pillars - Summerised
'Thinking cloud natively, provide a consistent approach and making sure that foundational areas are thought about which includes'
- Operational Excellence
- Prepare
- Operate
- Evolve
- Security
- Identity and access management
- data/infra protection
- Detective controls
- Incident response
- Reliability
- Foundations
- Change Management
- Failure management
- Performance Efficiency
- Correct resource types
- Review current/future resource types
- Monitoring
- Trade-offs
- Cost optimization
- Cost effective resources
- Match supply Vs demand
- Expenditure awareness
- Optimizing over time
General Design Principles
The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud for serverless applications:
- Speedy, simple, singular
Functions are concise, short, single purpose and their environment may live up to their request lifecycle. Transactions are efficiently cost aware and thus faster executions are preferred
- Think concurrent requests, not total requests
Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency.
- Share nothing
Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage are not guaranteed. State can be manipulated within a state machine execution lifecycle, and persistent storage is preferred for highly durable requirements.
- Assume no hardware affinity
Underlying infrastructure may change. Leverage code or dependencies that are hardware-agnostic as CPU flags, for example, may not be available consistently.
- Orchestrate your application with state machines, not functions
Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows.
- Use events to trigger transactions
Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. This asynchronous event behavior is often consumer agnostic and drives just-in-time processing to ensure lean service design.
- Design for failures and duplicates
Operations triggered from requests/events must be idempotent as failures can occur and a given request/event can be delivered more than once. Include appropriate retries for downstream calls.
Key AWS Services

| Area | Services available |
|---|---|
| Reliability | AWS Marketplace, Trusted Advisor, CloudWatch Logs, CloudWatch, API Gateway, Lambda, X-Ray, Step Functions, Amazon SQS, and Amazon SNS |
| Performance Efficiency | DynamoDB Accelerator, API Gateway, Step Functions, NAT gateway, Amazon VPC, and Lambda |
| Operational Excellence | AWS Systems Manager Parameter Store, AWS SAM, CloudWatch, AWS CodePipeline, AWS X-Ray, Lambda, and API Gateway |
AWS services cheatsheet (Area, Service options, Role)

AWS Serverless lens
Come up with plans to define and implement
Metrics to gather:
- Business Metrics
- Customer Experience Metrics
- System Metrics
- Operational Metrics
Alarming
-
AWS Lambda
Duration, Errors, Throttling, and Concurrency Executions. For stream-based invocations, alert on IteratorAge For Asynchronous invocations, alert on DeadLetterErrors.
-
API Gateway
Integration Latency, Latency, 5XXError
-
Application Load Balancer
HTTP Code_ELB 5XX Count Rejected Connection Count, HTTPCode Target 5XX_Count, Unhealthy Host Count, Lambda internalError, Lambda User Error
-
AWS AppSync
5XX and Latency
-
SQS
Approximate Age Oldest Message
-
Kinesis Data Streams
Read Provisioned Throughput Exceeded WriteProvisionedThroughputExceeded, GetRecords./teratorAge Milliseconds. PutRecord. Success, PutRecords. Success (ifusing Kinesis Producer Library) and GetRecords
-
SES
Rejects, Bounces, Complaints, Rendering Failures
-
AWS Step Functions
Execution Throttled. Execution Failed, Executions Timeout
-
EventBridge
Failed Invocations, Toddler Rules Amazon S3: 5xx Errors, Total Request Latency
-
Amazon DynamoDB
Read Throttle Events, Wite Throttle Events, System Errors Thailand
Logging (Centralised)
CorrelationId, request identifiers etc (use correct levels Ie ‘Info’)

Tracing - Distributed
Identify performance degradation and quickly understand anomalies, including latency distributions
Prototyping
"Every Lambda should to be treated as a microservice for testing"
The easiest way to test a serverless system maybe as a whole maybe to generate a separate system in a non-linked AWS account (or other cloud provider)
- Unit | For the most part the same as non serverless
- Creating modularized code as separate functions outside of the handler enables more unit-testable functions
- Mock out the cloud services
- Integration | Every Lambda should to be treated as a microservice for testing
- Shouldn't use mocks and try and touch the active services/functions where possible
- Performance | Use Lambda and 3rd party marketplace apps to load test
- Steady and burst rates
- Test async concurrency
- End to End
- The entire “distributed system” in a staging style environment with reasonable data
Deploying
- Use a framework where possible (serverless/SAM)to model, prototype, build, package, and deploy. (parameterize the app and its dependencies)
- Use synthetic traffic, custom metrics, and alerts as part of a rollout deployment and the use of canaries
- Separate environments by account (Dev, UAT, PROD)
Security
Identity and Access Management
Dont
- ❌ API Gateway API Keys is not a security mechanism and should not be used for authorization unless it’s a public API
- ❌ Sharing more than one IAM role within a lambda function is not recommended
Do
- ✅ Use Amazon API Gateway resource policies
- ✅ Use least-privileged access and only allow the access needed to perform a given operation.
Infra protection
Favor dynamic authentication, such as temporary credentials with AWS IAM over static keys. API Gateway and AWS AppSync both support IAM Authorization
Data Protection
- Watch out when using std out. This may print sensitive info to Cloudwatch
- Use a secrets manager that allows for rotation
- Encrypt any sensitive data traversing your serverless application
Reliability
- Regulate Inbound request rates
- Throttle at the API level based on access patterns (return 429)
- Issue API keys to clients with usage plans
- Look at concurrency controls where needed, and for single shard look at using Kinesis data streams
Cost Optimization
(Speed to market Vs Cost)
- pay-per-value pricing model
- CPU, network, and storage IOPS is based on memory selection
- Use key:value tags for the projects to track lambda billing
Failure management
- Build in retry logic to your FaaS where possible
- Use Step functions to minimise try/catch, back-off and retry logic
- When consuming from Kineses and DynamoDb use Lambda error handling and DLQ on failure
Performance Efficiency
- Selection - perf test to know the right configuration and Edge / regional API decisions
- Long running jobs should be passed to Step functions: (no additional cost incurred as this is billed per state transition and not tim spent in a state)
- Use global variables to maintain connections to your data stores or other services and resources
- Use of Athena to tune S3 queries (get only data required) OR S3 select
Training options
AWS WAF course
Architecting serverless solutions
Serverless framework