Scalability must be part of the design process because it is not a discrete feature that you can add later.
Application scalability requires a balanced partnership between two distinct domains, software and hardware. You can make great strides that increase the scalability in one domain only to sabotage them by making mistakes in the other. For example, building a load-balanced farm of Web servers will not benefit a Web application that has been designed to run only on a single machine. Likewise, designing a highly scalable application and then deploying it to machines connected to a low-bandwidth network will not handle heavy loads well when traffic saturates the network.

Distributed applications are also a step beyond traditional client-server applications. Distributed applications are applications that are designed as n-tier applications. Such distributed application architectures promote the design of scalable applications by sharing resources, such as business components and databases.

3.3.1 Scaling Up
Scaling up is the commonly used term for achieving scalability using better, faster, and more expensive hardware. Scaling up includes adding more memory, adding more or faster processors, or simply migrating the application to a more powerful, single machine. Typically, this method allows for an increase in capacity without requiring changes to source code. Administratively, things remain the same since there is still only one machine to manage.

Upgrading a hardware component in a machine simply moves the processing capacity limit from one part of the machine to another. For example, a machine that is at 100 percent CPU utilization could increase capacity by adding another CPU. However, the limitation may shift from the CPU to the system memory. Adding CPUs does not add performance in a linear fashion. Instead, the performance gain curve slowly tapers off as each additional processor is added. For machines with symmetric multi-processor (SMP) configurations, each additional processor incurs system overhead. Consequently, a four-processor machine will not realize a 400 percent gain in capacity over the uniprocessor version. Once you have upgraded each hardware component to its maximum capacity, you will eventually reach the real limit of the machine’s processing capacity. At that point, the next step in scaling up is to move to another machine.

Scaling up also presents other potential problems. Using a single machine to support an application creates a single point of failure, which greatly diminishes the fault tolerance of the system. While methods, such as multiple power supplies, may implement redundancy in a single-machine system, these options can be expensive.

3.3.2 Scaling Out

An alternative to scaling up is scaling out. Scaling out leverages the economics of using commodity PC hardware to distribute the processing load across more than one server. Although scaling out is achieved using many machines, the collection essentially functions as a single machine. By dedicating several machines to a common task, application fault tolerance is increased. Of course, from the administrator’s perspective, scaling out also presents a greater management challenge due to the increased number of machines.


3.3.3 Designing for Scalability

Good design is the foundation of a highly scalable application. At no other point in the lifecycle of an application can a decision have a greater impact on the scalability of an application than during the design phase.


The Scalability Pyramid


Application architects must consider scalability at all levels, from the user interface to the data store.

3.3.4. The five commandments of designing for scalability below can be useful when making design choices.

Do Not Wait
A process should never wait longer than necessary. Each time slice that a process is using a resource is a time slice that another process is not able to use that resource. You can place processes into two separate categories, synchronous and asynchronous.
One way to achieve scalability is by performing operations in an asynchronous manner. When operating asynchronously, long-running operations are queued for completion later by a separate process.
For example, some e-commerce sites perform credit card validation during the checkout process. This can become a bottleneck on a high-volume e-commerce site if there is difficulty with the validation service. For e-commerce sites that must physically ship a product to fulfill an order, this process is a good candidate for asynchronous operation. Because possession of the product does not shift from the seller to the buyer during the online transaction, the retailer can complete the credit card validation offline. The application can then send e-mail to the customer confirming the order after validation of the credit card transaction.

Do Not Fight for Resources
Contention for resources is the root cause of all scalability problems. It should come as no surprise that insufficient memory, processor cycles, bandwidth, or database connections to meet demand would result in an application that cannot scale.
You should order resource usage from plentiful to scarce. For example, when performing transactions that involve resources that are scarce and thereby subject to contention, use those resources as late as possible. By doing so, transactions that are aborted early will not have prevented or delayed a successful process from using these resources.Acquire resources as late as possible and then release them as soon as possible. The shorter the amount of time that a process is using a resource, the sooner the resource will be available to another process. For example, return database connections to the pool as soon as possible.

Design for Commutability
Designing for commutability is typically one of the most overlooked ways to reduce resource contention. Two or more operations are said to be commutative if they can be applied in any order and still obtain the same result. Typically, operations that you can perform in the absence of transaction are likely candidates.

For example, a busy e-commerce site that continuously updates the inventory of its products could experience contention for record locks as products come and go. To prevent this, each inventory increment and decrement could become a record in a separate inventory transaction table. Periodically, the database sums the rows of this table for each product and then updates the product records with the net change in inventory.

Design for Interchangeability
Whenever you can generalize a resource, you make it interchangeable. In contrast, each time you add detailed state to a resource, you make it less interchangeable.
Resource pooling schemes take advantage of interchangeable resources. COM+ component pooling and ODBC connection pooling are both examples of resource pooling of interchangeable resources.
For example, if a database connection is unique to a specific user, you cannot pool the connection for other users. Instead, database connections that are to be pooled should use role-based security, which associates connections with a common set of credentials. For connection pooling to work, all details in the connection string must be the same. Also, database connections should be explicitly closed to ensure their return to the pool as soon as possible. Relying on automatic disconnection to return the connection to the pool is a poor programming practice.
The concept of interchangeability supports the argument to move state out of your components. Requiring components to maintain state between method calls defeats interchangeability and, ultimately, scalability is adversely impacted. Instead, each method call should be self-contained. Store state outside the component when it is needed across method calls. A good place to keep state is in a database. When calling a method of a stateless component, any state required by that method can either be passed in as a parameter or read from the database. At the end of the method call, preserve any state by returning it to the method caller or writing it back to the database.
Interchangeability extends beyond resource pooling. Server-side page caching for a Web application will most likely increase its scalability. Although personalization can give a user a unique experience, it comes at the expense of creating a custom presentation that you cannot reuse for another user.

Partition Resources and Activities
Finally, you should partition resources and activities. By minimizing relationships between resources and between activities, you minimize the risk of creating bottlenecks resulting from one participant of the relationship taking longer than the other. Two resources that depend on one another will live and die together.
Partitioning of activities can help ease the load that you place on high cost resources. For example, using SSL entails a significant amount of overhead to provide a secure connection. As such, it is sensible to use SSL only for pages that actually require the increased security. In addition, Web servers dedicated to the task could handle SSL sessions.
However, partitioning is not always a good choice. Partitioning can make your system more complex. Dividing resources that have dependencies can add costly overhead to an operation.


3.3.5. Testing for Scalability

Careful planning and development are necessary for any application development project. However, to make a truly scalable application, it is important to rigorously and regularly test it for scalability problems. Scalability testing is an extension of performance testing. The purpose of scalability testing is to identify major workloads and mitigate bottlenecks that can impede the scalability of the application.
Use performance testing to establish a baseline against which you can compare future performance tests. As an application is scaled up or out, a comparison of performance test results will indicate the success of scaling the application. When scaling results in degraded performance, it is typically the result of a bottleneck in one or more resources.
When your application does not meet performance requirements, you should analyze data from the test results to identify bottlenecks in the system and to hypothesize a cause. Sometimes the test data is not sufficient to form a hypothesis, and you must run additional tests using other performance-monitoring tools to isolate the cause of the bottleneck.

Leave a Reply