The Modern Quality Engineering eBook
Explore how to reduce cost to operate and release to market faster while staying Customer-centric, DevOps-oriented and Quality-focused through Just Enough Quality and Next Gen Automation strategies.
Superior Customer Experience with Just Enough Quality
When the interaction fails to meet your customers’ expectations, revenue declines. In the current digital age, customers expect you to anticipate and meet their needs NOW, know your customer and provide meaningful personalization as part of the interactions, and provide timely, flawless interactions with significant results.
In response to this shift from customers accepting a product-centric business to expecting a customer-centric approach, in the last decade, two massive movements have taken place and completely transformed the way we develop modern applications – Agile and DevOps. These movements promise faster time to market and flexibility in rapidly meeting changing customer expectations. For some organizations, this is reflected by a sweeping, intentional change requiring teams to restructure, retool, and redefine the way new functionality is released. For others, this is a loosely structured, “self-improvement journey” based on a series of incremental changes.
Regardless of the path taken, Quality Engineering (QE) plays an increasingly important role in achieving the desired customer experience, and the current digital age has fundamentally shaped the new Quality Assurance and Software Testing mindset to:
- Support increasingly faster application releases
- Happen within smaller and smaller timeframes
- Be more business-focused and collaborative, rather than siloed by technology or function
On top of that, organizations are just expected to continue looking to optimize and increase productivity, as they have always been.
What does this mean for quality organizations? Well, there are some extremes that organizations have taken – some of them risky and detrimental. Some companies have opted to break down the TCOE (Testing Centers of Excellence) into distributed, completely product-siloed testing – which is not entirely wrong, but mostly incomplete. In fact, more and more TCOEs are being deconstructed each year (World Quality Report, 2018-19), and we see an increasing shift towards distributed organizations. But is this the right move and model? In recent years, we’ve seen companies switch from centralized to distributed models and back again – one or the other does NOT work – you need a little bit of both.
The other extreme, that frankly we hope we don’t see more of, is the complete reliance on developers to do testing. It is true that in an ideal world, we don’t need testers because developers code well, and with test-driven development, we have a safety net to ensure fewer problems. But this is akin to saying that ideally, people are good and are fully adherent to laws and can self-enforce and self-correct; it simply doesn’t happen in real life. We need our laws to have enforcers, and champions – we need police in our engineering organizations.
There are exceptions to this, of course – the most famous example of having no dedicated testers being Facebook – but this requires two things: company awareness that things will most definitely fail more frequently in production, and thus the company must have a deep threshold for failure, and second, architecture and code specifically constructed for developer-driven testing. In Facebook’s example, it is self-aware that things fail, but at the end of the day, they are a non-critical service. Additionally, Facebook does not have any true competitor today that will challenge them based on failures and even major issues (proven by the highly publicized events of 2018 and 2019). Also, the company has been stood up with this approach in mind – having a solid infrastructure that supports it is key. This is close to impossible for a company that has millions of lines of code expanded gradually over decades, patchwork architecture, and previous reliance on independent testers.
Expectations of the Current Digital Age
1. Faster Time to Market and More Frequent Releases
Technical capability, hardware, software, and infrastructure are all very accessible, leveling the playing field for organizations of all sizes. A 50-man startup is now able to push out new features at the same rate, or faster than a 50 billion-dollar enterprise, and to gain the upper hand means getting ideas into the hands of the end-users faster.
2. In-Flight Course Correction
Gone are the days when we lock in requirements over a year ahead of time. Today, the consumer is smarter, and their needs are more fluid, and that’s how businesses should operate as well. As it is necessary to enable rapid time to market cycles, it is important to be able to adapt to quickly changing business needs.
If a competitor pushes out a new product or feature that challenges, conflicts, or overshadows something already in-flight, the modern organization must be able to course- correct, revise, and add to or change ongoing work – something that was just not possible before. This is also promoted by the deeper involvement of business roles in the engineering life cycle. Where previously, the only touchpoint with the business was through business analyst (BA) roles and even then with limited engagement within projects, the role of the product owner (PO), and the tight coordination required of this role allows for more dynamism and ability to adapt on the fly.
3. Incremental and Iterative Delivery
This is both an approach and a capability – iterative and incremental delivery could be looked at only as a means towards achieving faster releases, but it enables so much more. Most important of these are two concepts – small and incremental changes are a way to test the market with something that is not too drastic, but which may possibly have significant effects (similar to the idea of Hypothesis or A/B testing); and allow organizations to intentionally deliver releases with known, live bugs – something controversial for traditional Quality Engineering(QE). Incremental and iterative delivery now makes it possible to revert small changes without the fear of catastrophic issues or downtime, putting a lot of focus on the production environment. This is in line with the whole idea of DevOps and important to note.
4. Collaborative and Synergistic Teams
A fundamental tenet to Agile and DevOps is the idea of teams that work collaboratively towards the same goal, which is usually towards the success of a product. Organizations have interpreted this idea in various ways – full-stack developers, co-located teams, physically open workspaces per team, or loose role definitions, among others. While there are multiple advantages brought about by each approach, organizations should carefully consider the repercussions and side effects of each, including the risk.
For instance, we have seen many of the open workspaces built for Agile teams create frustration and general unhappiness for those involved. In the discipline of QE, the idea of transferring testing responsibilities fully to developers or entirely focusing on tactical QE for one product opens up huge problems that should have been avoided.
We have painted a vivid picture of why and how organizations are changing, and it’s about time that we discuss how to uphold the ideas of Quality Engineering while operating in this new landscape. The goal of the QE Practice is now defined by adherence to the new mindset of QE and the four general tenets of Agile and DevOps. RCG has established three philosophies that address and reflect the expectations of the modern digital age, defining what Modern Quality Engineering should be.
Modern Quality Engineering functions at the Product level and at the Enterprise level
What you lose by doing this though is the ability to implement an organization-wide strategy towards common approaches, techniques, and innovation. This also disables many other things:
- Standard process and metrics
- Cross-product and cross-system integration testing
- Streamlined automation approaches
- Shared and reused quality engineering assets
- Holistic risk management
- Re-prioritization / re-allocation of QE personnel
This narrow focus essentially drops the ability to improve quality engineering at the enterprise level.
Instead, RCG establishes the ideal QE Practice for Agile and DevOps organizations by having both an enterprise and scrum level – a hybrid team. And so, what does a centralized-distributed hybrid team look like? It is essentially a group of QE practitioners made up largely of product-focused experts, but it operates under the direction of a central QE governance body running the community of practice. In a practical sense, 90-95% of the time the effort from these team members is dedicated to the testing of the product, while what remains is spent on syncing up with the practice. There can be many ways to implement the reporting structures, but the ‘syncing’ aspect (illustrated by the 2-way arrows in the next figure) represents both the guidance and assets provided by the practice. Additionally, the feedback loop from the scrum-embedded quality engineers is the mechanism that the practice uses to calibrate and adjust processes, as well as prioritize the creation of shared assets.
The Scrum QE team and the QE Practice work in tandem, and they must drive distributed and centralized functions collaboratively while working mostly with autonomy from each other. All distributed functions are the same activities that happen within a sprint as shown in the figure that follows. Inside your sprint, quality engineers are doing test planning, test case design, execution, defect management and reporting – as directed by the scrum master and according to the business priorities of the product owners.
Outside of the sprint are ongoing functions by the QE practice. And we propose that there are 3 major functions essential to the practice:
1. Quality Strategy and Management
This is the overarching governance of the community of practice. This defines the enterprise-level priorities of the QE organization, which includes what the key product releases are and where the highest risk areas are – ultimately deciding where QE resources and efforts need to be more or less focused. This also includes standard process and metrics definition to make collaboration possible and more efficient. Definitely, not all organizations need to have the same process and the same metrics, even within the organization. We do not espouse having a rigid set of processes and metrics that need to be one size fits all. Standardization only really makes sense at a level where processes allow for operational efficiencies and metrics allow for quantitative improvements.
2. Integration and Regression Testing Services
The second centralized QE function is optional for smaller organizations and those with non-integrating products. This function is focused on the end-to-end integration and regression testing of highly complex and integrated applications which tend to have testing only at the product level and, therefore can break in its integration points. For instance, if you have a sophisticated e-commerce site, go through the end-to-end flow of logging in, searching for a product, selecting an alternative recommendation, putting it in the shopping cart, and checking out probably touches multiple applications or services. The team responsible for the recommendations service is concerned with the accuracy and availability of the services. The testing for the recommendations service would have that focus (rightfully so), but that means end-to-end testing needs to happen across the flow and in between the steps. If the recommendations service tests the connecting steps to it, that would be great, but it may also be redundant if the other service (for instance, search) will also test its own integration with the recommendations service. Again, the level and scale at which this activity needs to be done is dependent on the complexity and integration of the parts, and how much integration testing is done within the product (or services), scrum teams.
3. Automation and Infrastructure Services
This is the third function of the centralized QE practice. Many organizations try to implement testing as part
of the scrum QE functions, but this leads to many redundancies and inefficient use of resources, not to mention incompatibility for end-to-end testing. With a large number of available, open-source, and commercial automation tools, there are multiple ways to automate with different pros and cons. If each team drives the approach, then it is inevitable that teams will choose various tools – making the upkeep and license cost vastly more than it needs to be. A lot of these tools, while having different advantages, largely address common testing needs and can be supplemented where needed. There is no perfect tool and no perfect software company, and this is truer across multiple teams. A good example is web service automation testing. APIs are mainly either in JSON or XML which means a tool that can read both is useful to most services teams. The other problem with teams selecting their own tools and frameworks is the inability to share test scripts and test assets, as well as not being able to build automated end-to-end test scripts.
The other aspect of this function is testing infrastructure, and this mainly relates to testing dependencies. Usually, the most pertinent of these would be test data management and test environments. The same concepts of compatibility and the importance of asset sharing is at play here. Again, if a central test data management team provides end-to-end test data, this services more teams, reuse is better, and redundancy is minimized. Similarly, product teams will always need their own test environments to some extent, but the ability to share these assets decreases operational costs significantly. Imagine products having their own stress testing environments, used very infrequently and mostly idle, instead of being reused for other products and other purposes.
The final piece of this model is to understand how the in-sprint and practice functions work together, going back to that two-way arrow representing the feedback loop. Although it is expected for all product teams to have variations and deviations to the standards according to the needs of the business and dependencies in technology, for the most part, the in sprint functions are governed by the process and metrics defined by the central practice. As well, the practice provides the scrum teams with automation test assets to accelerate their work. At the end of each sprint, all scrum QEs then provide the practice with metrics collected as part of their work – which the practice then consolidates and analyzes to make broader strategic adjustments and revisions to the process, metrics, tools, and approaches.
Modern Quality Engineering is Just Enough Quality
Gone are the days when quality-related activities were considered a necessary evil, and an unquantified, unjustified cost to organizations. With the tighter interlinking of business and technology teams, it has become vastly easier to drive decisions based on data, and this idea extends to the concept of QE. Good Quality Engineering entails a proper evaluation of what is just enough quality for every organization. This is a function of an organization’s risk threshold and the overall total cost of quality. Quality Engineering in this modern age needs to be scaled appropriately according to these two major criteria.
Risk Threshold
This is mostly subjective criteria and is dependent on many factors, such as its exposure to regulatory policies, the sensitivity of consumers, exposure to publicity, the criticalness of its business, potential financial impact, size of the user base, among others. The risk threshold is a function at the enterprise level by the QE Community of Practice, as a means to manage releases dynamically. An organization with a well-defined risk threshold will be able to use these factors and expand them to include ongoing activities such as readiness of the supporting organization, reliability of the infrastructure, roll-back and recovery capabilities, etc. to form a risk scoring system to predict and to manage major releases – making informed calls to defer, abandon, delay or even expand a release depending on its ongoing risk score level at multiple points in the release.
Total Cost of Quality (TCoQ)
This second piece is essentially the difference between the cost of failure and the cost of control. The cost of failure represents the cost of incidents and issues – including the revenue lost in downtime or service interruption as well as the cost of the effort to resolve and fix them. On the other hand, the cost of control is the sum of the static
(process and policy) and dynamic (software testing) efforts to prevent live issues by the organization. We strive to have a significantly positive TCoQ, and ideally, the cost of failure will be several times over the cost of control. If the TCoQ is negative, this means the cost of control spend needs to be driven down, and in this case, it is perfectly acceptable (even encouraged) to look at simplifying and streamlining processes, as well as scaling back testing activities, even if it means possibly increasing issues in production.
Effective management of TCoQ means process simplification and economic software testing. If a process slows work down continually or does not significantly reduce risk, then this is a major sign that the work is over-processed. A balanced complexity of the process is what we are looking for, and this tends to be tightly related to the size of the organization – meaning the smaller the team, the lesser the level of standardization they tend to require. As an example of this, a tennis doubles team would need to pre-define less plays than an American football team, but it still means that the doubles team would have a general strategy of how to work together.
At the same time, metrics standardization should follow this philosophy as well. Every set of metrics should have a goal of what it measures and what it means to improve. If you cannot define what a certain metric is helping improve, then most likely you don’t need it. A good approach to metrics is to start with more than you need, and over time, cut down based on what is essential to improving your delivery and what you can actually change and influence. For instance, it may not be useful to track the defect resolution (or defect turn-around) time, if you have 500 defects open for several months.
Economic Software Testing is a slightly more complex activity. Not only should software testing scale appropriately to the risk level of the organization and the cost of failure, it should also scale to a third factor – the available time window in the sprint, or release. Every experienced tester knows that when push comes to shove, it’s the testing window that gives way to changing requirements or code drop delays needing to shift the timeline. In this case, all testing activities supporting Agile and DevOps delivery need to be able to scale (mostly scale down) to tighter testing windows. That means testers need to be constantly aware of which transactions are most critical or most vulnerable both generally as part of the business and as the current release requires. This also means that test teams must have the ability to select or build the right tests at a moment’s notice, report if this matches the risk threshold expectations set by the organization (or QE Community of Practice), and push for a decision to delay or proceed with a release according to deviation to the risk threshold. Obviously, this selection is more extensive when you can execute more tests within a limited time-frame and that is where our final approach enters – a focus on Next Generation Automation.
Modern Quality Engineering relies on Next-Generation Automation
Although we have seen in the previous sections that there are many approaches and functions that are necessary to Modern Quality Engineering, we also need to acknowledge that a major part of managing a positive Total Cost of Quality entails efficiencies in testing, and as importantly, the ability to expand test coverage within tighter windows. This is virtually impossible without a well-managed automation capability.
We emphasize that automation must be well-managed; we have seen many (in fact, a large majority of) organizations that try to stand up automation on their own, ultimately fail to make long-term gains, especially when these efforts are ‘do-it-yourself’ attempts to automate from square one versus efforts led by experienced leaders, professionals or consultants. Automation requires strategy and technical experience to succeed, and as discussed previously, testing ideally should be run from a community of practice to ensure that tools and frameworks selection are done with the greater good of the organization in mind.
Modern automation also goes beyond just automating testing; it now involves automating the dependencies in testing (which is no longer test automation, but essentially process automation) as well as automating cognitive and creative functions and decisions required in testing. These other requirements are illustrated in the levels of test automation diagram below, described as levels 4 and 5, respectively. Levels 1-3 represent what is widely known as ‘Test Automation’ in the industry.
Level 1
is the ability to automate for different system platforms – the easiest of which is usually automating for web user interfaces (UI), and as the organization requires, there should be automation solutions defined as well for desktop applications, mobile apps, cloud-based applications, along with embedded software as part of IoT (Internet of Things), and any legacy mainframe and midrange systems.
Level 2
is the ability to automate for integrations. A major part of the DevOps movement is the emphasis on microservice architecture, and this consequently means that integrations through APIs and web services have become and will continue to be the approach of choice in modern application design. Testing needs to accommodate not only what happens via the UI at the presentation layer, but also data passed on through APIs. This includes external and internal APIs and, where there are still data connections, an ability to validate on the back-end via the data lake.
Level 3
is the ability to automate various types of testing. A lot of traditional testing is entirely functional – smoke, regression, integration, etc. Even in production, we have monitoring systems that can be expanded to act as ‘live tests’ that also provide important usage data as input to the design of future tests. Other testing types are non-functional or a mix of both, including performance testing, usability testing, and security testing. These all have various sub-types according to general and specific needs.
These first three levels cover what standard test automation is.
Level 4
goes beyond test automation and into process automation, and it also creates the blueprint for true Continuous Testing (or CI/CD Testing). Once all critical tests types are automated, the next step is to automate processes that feed the testing activities but not necessarily tests themselves. In testing, there are five major dependencies that need to be in place before testing happens.
- Test data and test environments, while complex to handle, are self-explanatory. These still remain as some of the toughest challenges for QE organizations, but they become increasingly more manageable with some automation in place. In particular, automated test data management solutions and automated provisioning of test environments mitigate the challenge.
- Availability pertains not only to the application under test but also to the availability of the necessary integrations or services. In this case, service virtualization solutions help alleviate the challenges of different release and availability schedules across applications.
- Access pertains to the ability to both simplify and broaden the availability of test assets and to personalize privileges to the right test suites to the right resources and team members.
- Finally, the process pertains to the right time and prerequisites for testing in conformance to organizational policy and process. The automation of the process eliminates the gap between getting the go signal and running the tests. While this may be not as significant, when deployments and builds become more and more rapid – every day, or even multiple times a day – having the ability for tests to trigger automatically, instead of waiting for an actual tester to run it, stacks up in a major way. This is more impactful when you consider that this now also enables unattended test execution – when a deployment is made in the middle of the night, tests will run and not wait for the next business day to execute.
Level 5
is the highest level and it pertains to the automation of cognitive and creative effort as it relates to testing, which is achievable mainly through artificial intelligence (AI) and machine learning (ML) technologies. As it stands today, although most of these offerings still lack the maturity and robustness of enterprise testing software that have been around pre-Agile, there are a handful of software companies that have made notable progress into leveraging AI and ML in testing. There are four major aspects of the software testing effort that can be automated using AI and ML:
- One area is the traversal of an application, writing (or more accurately, generating) test scripts and scenarios, optimizing coverage, and automated test maintenance.
- Automated test maintenance is especially important as a starting point, as one of the biggest challenges in being successful long-term in automation is keeping the maintenance effort low. Most organizations still reach a point where most of the automation effort is spent on fixing and updating broken tests. Consequently, this is where most AI and ML-based testing software developers have put the effort, and it remains the most promising area in terms of fully maturing.
- Another area is exploration, or automatically traversing possible paths and generating test cases based on those paths.
- Yet another is test optimization, which is the intelligent selection of test cases based on certain parameters (time available, riskiest functionality, defect concentration, most critical business function, etc). There are fewer companies who have made headway into addressing this area, but there is certainly marked progress in the last couple of years.
Modern Quality Engineering Impacts the Bottom Line
Organizations have largely failed to convert and transform their QE teams and programs successfully towards this goal often not achieving the balance of old and new approaches and instead of ignoring or reflexively going into wholesale reversals. The result is a painful transformation with no long-term benefit, and in some cases, the result actually does more harm than good.
RCG offers help in three capability areas – the Agile DevOps Quality Functions, Just Enough Quality, and Next Generation Automation. We believe that bringing in the expertise in these areas sets up a well-balanced, intelligently managed, cost-effective and resilient Quality Engineering practice for your organization. If your Quality Engineering efforts do not generate revenue or operational cost savings, you’re still behind – and we can help.