Record the content of an AWS Architecture Interview


Recently, I participated in an interview with a AWS architect. The whole interview time was quite long, almost half a year or so. But I was also learning the great AWS cloud product attitude, so I learned a lot of knowledge about the function and design of cloud products in the whole process, so it is still quite beneficial.
The previous written tests to answer the customer’s needs are quite smooth. Although the concept of the availability zone is considered to be unknown in the video interview meeting, and the interview is terminated, at least we have a certain understanding of the entire AWS cloud product, including how to choose for the customer in the practical application.
Get back to the point and record the content and ideas of the interview

Original interview:

Imagine that you meet with a small startup company in the early stages of their
operations. Currently their architecture uses a LAMP stack with MySQL, Apache and PHP all running on one desktop PC within their small office. Like many small start-ups they are confident that they will be the next big thing and expect significant, rapid, yet un-quantified growth in the next few months. With this in mind, they are concerned about:
scaling to meet the demand, but with uncertainty around when and how much this
demand will be they are very concerned about buying too much infrastructure too
soon or not enough too late!
their lack of provision for Disaster Recovery
their ability to configure their database and data access layer for high performance and throughput
making the user experience in the browser very low latency even though a large
portion of their user base will be from far away
effective distribution of load
a self-healing infrastructure that recovers from failed service instances
security of data at rest and in transit
securing access to the environment as the delivery team expands
an archival strategy for inactive objects greater than 6 months
ability to easily manage and replicate multiple environments based on their blueprint architecture
Recommend a manageable, secure, scalable, high performance, efficient, elastic, highly available, fault tolerant and recoverable architecture that allows the startup to organically grow. The architecture should specifically address the requirements/concerns as described above.
(1) A well written document in PDF format with no more than 6 pages.
(Note: The proposal should be a document, not slides.)
(2) Clearly and succinctly present an analysis of the startups requirements of how and why use every AWS services specifically based on your understanding.
(3) Proposed architecture diagram give a detailed description for your architecture diagram and explained why you choose this solution. (4) Clearly state all assumptions and references made during the design and explicitly state the referenced Amazon Web Services.

Executive Summary

Requirements Analysis

The customer uses a typical lamp stack, and the system is usually divided into web layer, APP layer and data layer. According to the customer’s ideas and concerns, the system can be described in the following aspects:

Scale up to meet the demand, but because of uncertainty about the time and extent of the demand, they are very worried about buying too much infrastructure too early or too late!

It shows that users are uncertain about the future scale. If they invest too early, it will inevitably cause waste of resources and costs, and too late, it may hinder the development of enterprises. In this way, cloud computing is required to have the ability of elastic scaling, which can automatically expand or reduce the scale of services according to the traffic.
Solution: Amazon EC2 auto scaling can be used to ensure the availability of EC2 queue and automatically expand and reduce the queue according to its needs, so as to maximize performance and reduce costs. At the same time, the instance type can use on-demand instances, the actual consumption of computing capacity to pay for expenses, rather than reserved instances.

For the lack of disaster recovery mechanism

If the deployment mode of self built computer room fails, it will cause disastrous consequences, even if it is restored, some data may be lost. As cloud computing, you need to have the ability to quickly recover failures and ensure that data is not lost,
Solution: adopt the Amazon EC2 instance recovery mechanism. If there is a problem with the instance, the replacement instance can be started quickly in a predictable way. Amazon RDS uses a highly available database composed of a primary database and a standby database. Usually, the standby instance is also stored in another availability zone. Amazon RDS will synchronously copy the data to the standby instance in another availability zone (AZ), and set the database snapshot at the same time. In addition, in case of hardware failure, Amazon RDS will automatically replace the computing instance used to support deployment.

They can configure the database and data access layer for high performance and throughput

Solution: you can analyze and adjust RDS database performance through performance insights to help customers quickly evaluate the performance of relational database workload.
In addition, the following measures should be taken to improve the performance:
1. In terms of configuration, the high-performance storage type (IOPs (SSD)) can be used to ensure that the database provides high-performance read and write operations;
2. Load the access of the database by separating the read and write of multiple read-only copies;
3. Reduce the number of database accesses through caching when necessary.

Although most of their user base is from afar, the delay of user experience in browser is very low

It is required that our application can be accessed by remote users with minimum network delay, usually in the form of CDN
Solution: through cloudfront, the edge sites can cache static data and speed up the web services allocated to end users.

effective distribution of load

Solution: EC2 can distribute traffic to multiple back-end application instances through elastic load balancing, and automatically expand according to the traffic load.
Reduce the pressure of database reading by using elastic cache to cache application data.
Through the read-only copy of the database, the data query can be flexibly extended to deal with the database load of a large number of read operations.

Payload a self-healing infrastructure that can be recovered from failed service instances

Service instances are required to have the ability to recover from failures.
Solution: detect auto scaling through the alarm indicators defined in cloudwatch
You can create Amazon cloudwatch alerts to monitor Amazon EC2 instances. If the instance is damaged due to an underlying hardware failure or problem that requires AWS participation to fix, the instance can be restored automatically.

Data security in static and transmission

Solution: as a service provided to users by the web layer, users can apply for SSL certificate through AWS certificate manager.
Data security is usually done in the form that data transmission process and storage can be encrypted.
The security of data storage usually uses encryption algorithm to store data,
For the security of data in the process of transmission, VPC plays the role of isolating resources. In the network layer, the connection can only be established by the client using the privileges given by Iam. Data can be transmitted through SSL / TLS during transportation.

Protect access to the environment as the delivery team expands,

Solution: use Iam to define different permissions of users, roles, and groups, and grant different permissions to different people for different resources. For example, you can allow some users to have full access to Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), Amazon dynamodb, Amazon redshift, and other AWS services. For other users, you can allow read-only access to some S3 buckets only, or manage access to some EC2 instances only, or access to your billing information without access to any other content. You do not have to share your password or access key.

Because the delivery team has expanded the archiving strategy for inactive objects for more than six months:

There needs to be a container for storing files that can have related logs and inactive objects on a regular basis.
Solution: S3 can be used to persist static objects, such as deployment archive files, scripts, database backup files and logs, media files, etc.

Multiple environments can be easily managed and replicated based on their blueprint architecture.

Solution: if you copy to different regions, you can use AWS cloudformation, which provides a common language to describe and pre configure all the infrastructure resources in your cloud environment. Cloudformation enables you to use simple text files across all regions and accounts to model and preconfigure all the resources your application needs in an automated, secure way.
In addition, you can use Beanstalk to quickly deploy PHP applications by using beanstalk

Solution Design

System architecture:

Freedgo design, an online mapping website, is used. Its visiting address is
Freedgo design is an online drawing software for many types of charts. It allows you to create Alibaba Cloud Architecture, Tencent Cloud Architecture, Oracle Cloud Architecture, AWS system deployment diagram software architecture, UML, BPMN, ERD, flow chart, UX design diagram, ant design, mind map, chart. It can be used for free by registered users.
The specific drawing steps are as follows:

  • Open freedgo design registration page, first click to register as a registered user, freedgo design provides email, wechat, QQ, Weibo and other registration methods.
  • After the registration is successful, click the start making button, and then enter the drawing tool page to draw.
  • Select menu file – > create from type – > Cloud Architecture – > AWS

Architecture Preview

Record the content of an AWS Architecture Interview

Design Detail

network layer

Route53: the DNS domain name resolution service implemented, which is connected to cloudfront endpoint through CNAME.
Cloudfront: to realize the global content publishing network, user requests will be directed to the node with the lowest delay to provide the best performance of the delivered content. It is necessary to set cloudfront to set the access source as the ELB node of the application.
AWS regoin is the area for application deployment. A region can have A-Z availability area.

Route53: Implemented the DNS domain name resolution service, connecting to the CloudFront endpoint via CNAME.
cloudFront: Implementing a global content delivery network, user requests will be directed to the lowest latency node, providing the best performance for the delivered content. You need to set the cloudFront to set the access source to the application’s ELB node.
AWS Regoin is the area where the application is deployed. A Region can have an A-Z Availability Zone.

application layer

Auto scaling integrates with ELB to achieve the availability and scalability of application services. It attaches ELB to the existing auto scaling group to achieve load balancing. It can automatically register instances in the group and assign incoming requests to these instances. In terms of availability, if there is a service failure and downtime, auto scaling can quickly find the problem machine and start a new machine to continue service. In terms of scalability, auto scaling can be used to set the min / max / parameter to automatically expand the number of EC2 service instances. Each instance in the autoscaling group is in a different availability zone to prevent failure in the availability zone

Data layer

Use elasticache redis to improve the reliability of production deployment, relieve the pressure of front-end requests on database access, reduce latency, and also play a role in disaster prevention and mitigation.
The redis replication group consists of one application readable and written primary node and two read-only replica nodes. When data is written to the primary node, it is also updated asynchronously on the read-only replica node. This can effectively prevent node failure, and deploy a cluster service in each of the two zones, mainly to avoid zone failure.
Database is responsible for high availability and high performance data storage of database. A high availability database usually contains two database instances: a primary database and a standby database. When all requests are sent to the primary database, the RDS instance is responsible for responding to the server requests and completing the data read and write operations. Synchronous replication of data between primary and standby databases. If the primary database is unavailable due to hardware or network failure, RDS will automatically detect the failure, start the failover process, the standby database will become the primary database, and DNS will also be automatically updated to achieve rapid failover.

VPC & security group settings

Each layer has designed security groups and subnets, which can provide more effective security mechanisms.
In the security control of application layer autoscaling, define how to access the 80 and 443 ports of the application, allow the Internet users to access the portal, and define the SSH 22 port to specify the access of a specific location.

The data layer sets the 3306 port in the inbound policy of the security group to allow the application to access MySQL / Aurora
The inbound policy of elasticache security group defines the port that allows applications to access custom TCP to access redis


The system can monitor the server health by using cloudwatch to monitor the memory utilization, processor utilization, cache hit rate and other indicators of the whole system.


The requirements raised by startups are exactly the problems that cloud platform providers need to solve. How to provide a manageable, high-performance, highly available and secure basic service platform, while facilitating users’ daily maintenance, release and response to emergencies.
High performance: it is also the customer’s concern. AWS covers almost 11 major regions, 42 availability zones and 52 edge sites around the world. It can provide high-performance services as well as high availability services. AWS provides various types of host types, memory optimization, storage optimization, etc. in each layer to meet different needs.
High availability: no matter autoscaling, database or S3 of APP layer, they will provide copies of several applications in different availability zones, which can ensure that when one availability zone is unavailable, the application can quickly switch to another availability zone to achieve high availability.
Security: AWS can protect the network and enhance the security of Internet access through the automatic monitoring system, and achieve network security and isolation through the control of VPC and security group.


Online mapping tool:
Router53 uses:
Route 53 console:
RDS console:…:id=csydb;is-cluster=false