JustGiving Logo

Serverless Payments with AWS Step Functions

17 July, 2020

Written by Paul Freedman

dummy-frontmatter

Serverless payment processing

One of our main focuses in Fintech recently has been implementing a payment processing flow that uses AWS Step Functions. For the uninitiated AWS Step Functions are essentially a way of defining a state machine that runs in the cloud. You supply a JSON input and the execution flows through from step to step before ultimately producing an output. The step function is defined in either a JSON or YML file which describes what each step should do (e.g. run a specific Lambda function, publish an SNS event etc.) and which order steps should be run in.

AWS Step Functions offers us myriad benefits including:

  • excellent scalability
  • freedom from managing underlying infrastructure
  • a modular approach, meaning we can utilise different technologies and even languages for individual parts of the application

However developing and deploying code this way is a fundamentally different paradigm from running Docker containers or EC2 instances and took some getting used to.

Our payment processing step function

Below is a simplified representation of our step function, with some parts streamlined for illustrative purposes.

NOTE: PSP here stands for [Payment Service Provider](https://en.wikipedia.org/wiki/Paymentserviceprovider)

step-function

The basic flow is:

  • The step function is triggered with a JSON payload containing donation information
  • The PSP to use is selected using data from the JSON payload and/or configuration
  • The selected PSP lambda is executed, which calls out to the PSP's API to take a payment
  • We examine the response from the PSP lambda to see whether the call was successful

    • If not, we notify our other systems that there will be a delay in collecting the payment and wait before attempting the payment again
  • After completing the transaction we trigger some post-processing steps to update other systems about this transaction
  • Before finishing we persist the final JSON state of the step function so that we can query it later

The JSON payload gets passed down and augmented by each step so that when PersistOutput is reached we are able to store a full summary of everything that occurred during this transaction. This is useful if individual transactions need to be examined at a later date.

The lambda functions

The lambda functions are defined using the Serverless framework:

---
setup:
  name: setup
  handler: src/handlers/setup.default

and referenced from the step function YML file:

---
Setup:
  Type: Task
  Resource:
    Fn::GetAtt: [setup, Arn]
  Next: SelectPsp

The exception is the PSP lambdas which are each defined in their own repository, which allows us to:

  • keep common functionality separated from the much more opinionated implementations that each PSP uses

    • for example each will have their own response codes that need to be mapped, and API request objects that have differing signatures
  • define differing retry schedules per PSP

    • the PSP lambdas return a delay to the step function in cases where retries are needed telling it how long to wait before making another attempt to authorise the payment
  • keep related PSP lambda functions together e.g. capturing payments, performing refunds etc.

A common interface is agreed between the step function and any PSP lambdas so we can use them interchangably.

End-to-end tests

As part of our build pipeline we run end-to-end tests against the step function whenever it is deployed to our test environment. These tests simulate different payment scenarios and validate that the step function behaves as expected.

Tests can validate fields on the output:

expect(output.attempts.length).toBe(0);
expect(output.outcome.success).toBe(false);
expect(output.outcome.retry).toBe(false);

and check that individual lambda function steps were executed (or not):

filter(StepType.LambdaSucceeded, events, (e) => {
  expect(count(e, 'SelectPsp')).toBe(1);
  expect(count(e, 'RedirectToPsp')).toBe(0);
  expect(count(e, 'PostProcessing')).toBe(1);
  expect(count(e, 'PersistOutput')).toBe(1);
});

The test runner

Our end-to-end tests use a test runner module which uses the AWS StepFunctions SDK to trigger an execution and then retrieve some information about it for use in the calling test. An environment variable is used to specify the ARN of the step function under test, making it very simple to deploy a new instance of the step function and run our tests against it in isolation.

This top-level function gives a good sense of how it works without diving into the full detail:

export const runToCompletion = async (
  name: string,
  input: StepFunctionInput
): Promise<[any, StepFunctions.HistoryEvent[]]> => {
  const executionName = `${name}-${Date.now().toString()}`;
  const stepFunctions = new StepFunctions({
    region: awsRegion,
    endpoint: stepFunctionsEndpoint,
  });
  const describe: StepFunctions.DescribeExecutionOutput = await runStateMachine(
    stateMachineArn,
    input,
    executionName,
    stepFunctions
  );
  const events: StepFunctions.HistoryEvent[] = await getExecutionEvents(
    stepFunctions,
    describe
  );
  if (!describe.output) {
    throw new Error('Execution output is empty');
  }
  return [JSON.parse(describe.output), events];
};

Broadly we are

  • creating an SDK instance
  • using it to trigger an execution with the input defined in the test
  • using that execution's output to retrieve the full list of events that were called
  • returning the output and event history to the calling test

The key point here is that the actual code under test is running IN AWS, whereas the test is running on the build agent (or locally).

Wrapping up

Our payments processing step function has been in production for a while now and has proven to be very robust, coping with periods of increased demand without breaking a sweat. We have enjoyed the rapid, frictionless release cycle we've been able to adopt as a result of both having infrastructure management abstracted away and having a thorough suite of tests to confirm that the system is still meeting all the business' needs.

Having such an effective system taking payments helps us to ensure that all the good causes on JustGiving continue to receive the money from all our kind donors as quickly as possible and allows us to focus on continuing to deliver even more awesome features!