BookingLive Server Policy

Architecture

The following content is based upon BookingLive being hosted in Ireland, the zone specified for the architecture. BookingLive never hold any data in our systems, so these policies sit directly with Amazon who operate a number of standards. You can read more here:

Platform

BookingLive sits on the Silverstripe framework / platform. The overall architecture, is managed and operated by Silverstripe on the AWS platform for BookingLive, supporting the Silverstripe framework and underlying architectural code. The full architecture diagram is attached in Appendix 1 – Full Architectural Diagram.

Usage

The proposed solution is scalable for 000’s of page views a day. The system will be scalable if the situation changes so that it can grow with requirements (subject to cost)

Servers

There will be maximum provisioning of 4 servers, which will autoscale within the availability zone.

VPC

A VPC is a logical container that encapsulates Customer specific resources, e.g. subnets, routing tables, EC2 instances, internet gateways, VPN, security groups and network ACLs. It has a CIDR IP range that defines the extents of the VPC.

Multiple Stacks can exist in a single VPC. A specific VPC contains all of the services for a specific stack, and there is limited ability to share services across VPCs. A single VPC will be used to contain the booking system, and as such will segment all data from other cloud users in addition to the security measures already in place.

Auto Scaling Groups in Elastic Stacks

These Application Instances are configured in an autoscaling group which includes settings for a minimum and maximum number of instances to be made available.

These instances are then scaled up and down depending on CPU load metrics to ensure that capacity adjusts to incoming traffic.

Application Load Balancing in Elastic Stacks (provided by ELB)

The ELB service is a scalable solution for load balancing instances. It provides a dynamically set of EC2 instances that routes TCP traffic between a set of health-checked back-end instances.

The request routing is using the least outstanding request (for HTTP/HTTPS connections) routing algorithm. The least outstanding request routing algorithm favours back-end instances with the fewest outstanding requests.It can only balance instances that are located in the same Region.

Database (provided by AWS RDS)

The solution uses the AWS Relational Database Service for storage of website database content. Specifically it always uses the MySQL 5.5 version of the AWS RDS for maximum performance and compatibility with the SilverStripe CMS. The RDS is deployed in a Master / Slave configuration that is transparent from the applications and point of view. If the Master database is failing or there’s an outage the Slave is automatically promoted to a Master and a new Slave is created.

The application only connects to the RDS instance acting as master via a CNAME to allow for dynamic IP selection, and Security Groups are configured to only allow web server instances within the Environment to connect to any RDS.

Amazon RDS are automatically backed up daily and are automatically retained for 30 days. These backups allows point in time recovery at any time between now and 30 days back in time.

In case of an infrastructure failure (for example, instance hardware failure, storage failure, or network disruption),Amazon RDS performs an automatic failover to the standby so that the Application can resume database operations as soon as the failover is complete. Since the endpoint for the DB Instance remains the same after a failover, the Application can resume database operation without the need for manual administrative intervention. Failover does that some limited period of time, and during that time the website will show the configured 500 error page.

Amazon RDS are automatically backed-up daily and are retained for 30 days. These backups allow point in time recovery at any time between now and 30 days back in time.

AWS DynamoDB

TDynamoDB is a NoSQL storage engine that can be used by Applications. This service can not be used as a backend for normal Application models, but custom code can interact with it, as it can give improved performance over RDS (but with several trade-offs).

Currently DynamoDB is used in Elastic Stacks for storing website users sessions in a centralized location so that the traffic can be load balanced between all web server instances without needed to use ‘sticky sessions’.

INCAPSULA

In most circumstances, HTTP and HTTPS traffic intended for a Stack will first be passed through the Incapsula WAF. This is achieved by configuring the DNS records of the Stack to point to the Incapsula WAF. Global routing management is used to direct this traffic to the nearest Incapsula datacenter, where Layer 7 proxies perform automatic caching and security filtering, and then pass on the request to the AWS ELB via regular internet routes.

The Incapsula WAF provides request security filtering, DDoS protection, caching optimisation using a globally distributed Content Delivery Network (CDN). For HTTPS traffic, the Incapsula WAF provides SSL trans-encryption (traffic is decrypted, then re-encrypted before transmission).

Availability and Backup Architecture

The following content is based upon BookingLive being hosted in Ireland, the zone specified for the architecture

Points of failure and impact

Resource Elastic Single AZ
Single application instance If auto-scaling has created multiple instances: Seamless
if auto-scaling has reduced to a single instance: Healing outage
ELB instance Seamless
Single RDS instance – master Significant outage
Single RDS instance – slave N/A
NAT instance Significant outage to outgoing requests
Bastion instance Significant outage to administrative access
Availability Zone Significant outage
Region Significant outage
NFS instance Significant outage

A seamless response should be mostly unnoticeable. Requests that were in process of being handled by the resource in question at the moment of failure will themselves fail, and could cause user-visible errors for users with such requests. Performance may momentarily dip while a replacement resource is spun up. The resiliency of the Environment to further outages will be reduced.

A healing outage consists of a ten (10) minute outage while the cluster automatically heals and brings up new EC2 instances and/or reconfigures the RDS instance. The resiliency of the Environment to further outages during healing will be significantly reduced.

A significant outage consists of an outage greater than ten minutes. Manual intervention will be required to either restore the failed resource or re-deploy to an alternative Availability Zone or Region. Restoration of data from backups may be required.

Disaster recovery

Recovery in the case of disaster is handled by auto-scaling groups that will release a new instance in the non affected Availability Zone, rebuilding using this instances rules and restoring the latest data to it. This will cause a website outage during the time the instance is provisioned.

Due to no multi-zone requirements, there is no automated failover DR in place, for where an individual AZ fails. This will require a manual intervention and decision making process to start the instances within another available European AZ with the same configurations and backups available.

Back Up Methods

Backups are stored in S3 and after two (2) months in Glacier for long term archival. Storage is secured by the least privilege and are only accessible to the environment that created the backup. The backups are stored encrypted.

AWS S3 storage

The AWS S3 provides a highly available and redundant storage that is used for Backups. S3 is designed for 99.999999999% durability and 99.99% availability of objects over a given year.

AWS Glacier

Amazon Glacier is an extremely low-cost cloud archive storage service that provides secure and durable storage for data archiving and online backup. In order to keep costs low, Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.

Glacier is used in for archiving backups for long term storage. Minimum retention period of 5 years as detailed in suppliers tender.

Additional Services

Logging

All instances within an environment send all logs to a centralised logging server that is otherwise locked down. This allows later access even in the case an instance’s own logs are unavailable, analysis across multiple instances, and provides a secure audit trail if an instance was penetrated by an attack.

This logging includes:

  • syslog
  • instance SSH logins
  • system modifications
  • web server access logs
  • web server error logs
  • CMS application errors.
  • Email transmission logs
  • CMS / Audit log

Clients will not have access to these logs.

Responsibility Matrix

Acess Matrix

BookingLive CM

BookingLive SM

BookingLive DM

BookingLive TS

BookingLive TA

Administrator

Silverstripe

Centralised Logging

Centralised monitoring

Centralised configuration

Service desk

Code repository

Deployment – Excluding Production

Deployment – Production

Application Administration

BookingLive CM

BookingLive SM

BookingLive DM

BookingLive TS

BookingLive TA

Administrator

Silverstripe

Centralised Logging

Centralised monitoring

Centralised configuration

Service desk

Code repository

Deployment – Excluding Production

Deployment – Production

Application Administration

Customer Manager (CM)

A customer manager is the primary contact and the person with the highest authority. Each customer has exactly one Customer Manager associated with it, but a backup may be allocated.

Stack Manager (SM)

The stack manager is the business person that looks after the website(s) for a stack on a day to day basis. This person does not normally produce technical solutions or content but oversees the running of the website. Sometimes they might be called Product Manager or Site Manager. Each Stack has exactly one Stack Manager associated with it, but a backup may be allocated.

Deployment Manager (DM)

The deployment manager is the optional technical person that looks after the website(s) for a stack on a day to day basis. Sometimes a Stack Manager wants to delegate the technical responsibilities included in the Stack Manager role to another person. For instance, if the Customer has contracted an SPO to provide development services, they may wish to assign authority to manage deployments to that SPO without also assigning authority to make purchasing decisions. Each Stack can have no more than one Deployment Manager associated with it, but a backup may be allocated.

Technical Staff (TS)

Technical Staff are often members of a web development team and are responsible for Application Code. They will get access to a code repository and the deployment tools that promotes an agile delivery workflow. Each Stack can have an unlimited number of Technical Staff associated with it.

Technical Auditor (TA)

Technical Auditors are often members of a web development team, and are responsible for reviewing Application Code produced by other Technical Staff, but are not otherwise involved in the operation of the Stack. They will get read-only access to the Stack’s code repositories. Each Stack can have an unlimited number of Technical Auditors associated with it.

Administrator (Administrator)

An Administrator have full access to BookingLive. They can add / remove users from the system as well as accessing all the features of the site. Each Environment can have an unlimited number of Administrators associated with it.

Security Questions

Malware/Vulnerability
There are two tiers to this process. The direct server management team, will be keeping the system up to date with OS level patching against new threats such as Heartbleed, which are aimed to be resolved within 24 hours. Additionally you’ll be utilising the Incapsula Enterprise service, which you can read more about here – https://www.incapsula.com/enterprise-plan.html
This adds a non-human monitoring element for constant notifications of attack vectors such as SQL Injection / RFI / DDOS as well as providing the monitoring for malware that may be introduced via attack vectors.
Penetration Testing
The platform has been penetration tested, but this is the clients responsibility in regards to going live. Because your applications have not yet been developed, there is no reports we can share with you, as they’re confidential to the clients who initiated them and would further serve no value.
Penetration testing is provided via an external organisation for example NCC, who assess the platform once configured and a version of the software is deemed production ready.
Data Protection
Appendix 2 – Data Protection
Data Destruction
We never hold any of your system data in our systems, so these policies sit directly with Amazon who operate a number of standards to satisfy your requirements here. You can read more here –

Where a contact ends on our AWS platform – The instances and backups would be removed as part of this contract upon an agreed date with the client. All of the deletion of these items, are handled by Amazon directly and neither BookingLive nor Third Parties have access to or ability to reverse this process.

Appendix

Appendix 1 – Full Architectural Diagram

The implementation of BookingLive is based within European Union and Ireland zone by default. The layout is the same as the architectural diagram below.

Appendix 2 – Data Protection

Please refer to the external data protection document.