A cheats guide to serverless standards (AWS edition)
Scott Griffiths / May 12, 2020
8 min read •
Tldr;#
Good practice / design conciderations when it comes to implementing FaaS and serverless solutions
Faas 101#
Function-as-a-Service, or FaaS, is an event-driven computing execution model that runs in stateless containers and those functions manage server-side logic and state through the use of services.
Serverless 101#
What Are We Trying to Achieve with Standards#
To have a standardised approach to dev, test, deployment, operations & support for all FaaS applications and serverless services within the business
Approach#
- Review the current state of Lambdas being created within the business
- A look into the frameworks that were being used (if any)
- Socialise FaaS through trainings and a guild
- Base this around the AWS WAF and serverless lens whitepapers
The Ideal Result Being#
A collaboration and ownership culture where standards can encapsulate and define how we do:
- Reporting / Logging
- Alerting and notifications (pagerDuty, Slack etc)
- All the tests (Load , Unit, Integration, E2E)
- DevSecOps
- The use of frameworks
- Patterns and sample applications
- When to use serverless over conventional practices
For the most part Based around the Serverless Manifesto and The 12 Factors guides
Items Covered#
- AWS well architected pillars
- General Design Principles
- Key AWS available services
- AWS service options (cheatsheet)
- AWS Serverless Lens good practice
Aws (Waf) pillars - Summerised#
'Thinking cloud natively, provide a consistent approach and making sure that foundational areas are thought about which includes'
- Operational Excellence
- Prepare
- Operate
- Evolve
- Security
- Identity and access management
- data/infra protection
- Detective controls
- Incident response
- Reliability
- Foundations
- Change Management
- Failure management
- Performance Efficiency
- Correct resource types
- Review current/future resource types
- Monitoring
- Trade-offs
- Cost optimization
- Cost effective resources
- Match supply Vs demand
- Expenditure awareness
- Optimizing over time
General Design Principles#
The Well-Architected Framework identifies a set of general design principles to facilitate good design in the cloud for serverless applications:
- Speedy, simple, singular
Functions are concise, short, single purpose and their environment may live up to their request lifecycle. Transactions are efficiently cost aware and thus faster executions are preferred
- Think concurrent requests, not total requests
Serverless applications take advantage of the concurrency model, and tradeoffs at the design level are evaluated based on concurrency.
- Share nothing
Function runtime environment and underlying infrastructure are short-lived, therefore local resources such as temporary storage are not guaranteed. State can be manipulated within a state machine execution lifecycle, and persistent storage is preferred for highly durable requirements.
- Assume no hardware affinity
Underlying infrastructure may change. Leverage code or dependencies that are hardware-agnostic as CPU flags, for example, may not be available consistently.
- Orchestrate your application with state machines, not functions
Chaining Lambda executions within the code to orchestrate the workflow of your application results in a monolithic and tightly coupled application. Instead, use a state machine to orchestrate transactions and communication flows.
- Use events to trigger transactions
Events such as writing a new Amazon S3 object or an update to a database allow for transaction execution in response to business functionalities. This asynchronous event behavior is often consumer agnostic and drives just-in-time processing to ensure lean service design.
- Design for failures and duplicates
Operations triggered from requests/events must be idempotent as failures can occur and a given request/event can be delivered more than once. Include appropriate retries for downstream calls.
Key Aws Services#
Area | Services available |
---|---|
Reliability | AWS Marketplace, Trusted Advisor, CloudWatch Logs, CloudWatch, API Gateway, Lambda, X-Ray, Step Functions, Amazon SQS, and Amazon SNS |
Performance Efficiency | DynamoDB Accelerator, API Gateway, Step Functions, NAT gateway, Amazon VPC, and Lambda |
Operational Excellence | AWS Systems Manager Parameter Store, AWS SAM, CloudWatch, AWS CodePipeline, AWS X-Ray, Lambda, and API Gateway |
AWS services cheatsheet (Area, Service options, Role)
Aws Serverless Lens#
Come up with Plans to Define and Implement
Metrics to Gather:#
- Business Metrics
- Customer Experience Metrics
- System Metrics
- Operational Metrics
Alarming#
AWS Lambda
Duration, Errors, Throttling, and Concurrency Executions. For stream-based invocations, alert on IteratorAge For Asynchronous invocations, alert on DeadLetterErrors.
API Gateway
Integration Latency, Latency, 5XXError
Application Load Balancer
HTTP Code_ELB 5XX Count Rejected Connection Count, HTTPCode Target 5XX_Count, Unhealthy Host Count, Lambda internalError, Lambda User Error
AWS AppSync
5XX and Latency
SQS
Approximate Age Oldest Message
Kinesis Data Streams
Read Provisioned Throughput Exceeded WriteProvisionedThroughputExceeded, GetRecords./teratorAge Milliseconds. PutRecord. Success, PutRecords. Success (ifusing Kinesis Producer Library) and GetRecords
SES
Rejects, Bounces, Complaints, Rendering Failures
AWS Step Functions
Execution Throttled. Execution Failed, Executions Timeout
EventBridge
Failed Invocations, Toddler Rules Amazon S3: 5xx Errors, Total Request Latency
Amazon DynamoDB
Read Throttle Events, Wite Throttle Events, System Errors Thailand
Logging (Centralised)#
CorrelationId, request identifiers etc (use correct levels Ie ‘Info’)
Tracing - Distributed#
Identify performance degradation and quickly understand anomalies, including latency distributions
Prototyping#
"Every Lambda Should to Be Treated as a Microservice for Testing"
The easiest way to test a serverless system maybe as a whole maybe to generate a separate system in a non-linked AWS account (or other cloud provider)
- Unit | For the most part the same as non serverless
- Creating modularized code as separate functions outside of the handler enables more unit-testable functions
- Mock out the cloud services
- Integration | Every Lambda should to be treated as a microservice for testing
- Shouldn't use mocks and try and touch the active services/functions where possible
- Performance | Use Lambda and 3rd party marketplace apps to load test
- Steady and burst rates
- Test async concurrency
- End to End
- The entire “distributed system” in a staging style environment with reasonable data
Deploying#
- Use a framework where possible (serverless/SAM)to model, prototype, build, package, and deploy. (parameterize the app and its dependencies)
- Use synthetic traffic, custom metrics, and alerts as part of a rollout deployment and the use of canaries
- Separate environments by account (Dev, UAT, PROD)
Security#
Identity and Access Management#
Dont
- ❌ API Gateway API Keys is not a security mechanism and should not be used for authorization unless it’s a public API
- ❌ Sharing more than one IAM role within a lambda function is not recommended
Do
- ✅ Use Amazon API Gateway resource policies
- ✅ Use least-privileged access and only allow the access needed to perform a given operation.
Infra Protection#
Favor dynamic authentication, such as temporary credentials with AWS IAM over static keys. API Gateway and AWS AppSync both support IAM Authorization
Data Protection#
- Watch out when using std out. This may print sensitive info to Cloudwatch
- Use a secrets manager that allows for rotation
- Encrypt any sensitive data traversing your serverless application
Reliability#
- Regulate Inbound request rates
- Throttle at the API level based on access patterns (return 429)
- Issue API keys to clients with usage plans
- Look at concurrency controls where needed, and for single shard look at using Kinesis data streams
Cost Optimization#
(speed to Market Vs Cost)
- pay-per-value pricing model
- CPU, network, and storage IOPS is based on memory selection
- Use key:value tags for the projects to track lambda billing
Failure Management#
- Build in retry logic to your FaaS where possible
- Use Step functions to minimise try/catch, back-off and retry logic
- When consuming from Kineses and DynamoDb use Lambda error handling and DLQ on failure
Performance Efficiency#
- Selection - perf test to know the right configuration and Edge / regional API decisions
- Long running jobs should be passed to Step functions: (no additional cost incurred as this is billed per state transition and not tim spent in a state)
- Use global variables to maintain connections to your data stores or other services and resources
- Use of Athena to tune S3 queries (get only data required) OR S3 select