Conf42 Site Reliability Engineering (SRE) 2024 - Online

The challenges of Platform teams (and a few tips & tricks to overcome them)

Abstract

If you are a platform engineer or a platform manager and you want to improve the efficiency of your team, this talk is for you.

Summary

  • Marco has been working for the last three years in a consultancy company called Fluttworks. His true passion lies in animal rights activism. Today we will discuss the challenges platform teams face on their day to day operation. Also discuss the tips and tricks on how those challenges can be overcome.
  • Developers need to be put at the center all new product development. Categories that are tied together are simplicity and usability. A product should be both simple and usable. A common problem writing software for developers is their traditional lack of feedback.
  • There can be a common distrust of platform technologies, especially from senior stakeholders. Using so many different technologies means that platform teams need to spend a significant amount of their time keeping up to date with this changing landscape. How can we overcome these challenges?
  • platform teams should make sure that the internal platform roadmap is aligned with a roadmap of the product they serve. It's important to build a platform product as an ecosystem where different solutions integrate. evaluation techniques should be in place and should be approved by the organization at large.
  • We talked about the most common challenges that platform team face on their day to day operations. We analyzed how those challenges can be addressed and even how platform team can shift their mindset into a more agile and product driven mindset. Thank you very much for attending.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning. Good afternoon everybody. Today we are going to discuss the challenges platform teams face on their day to day operation, and then we're going to discuss the tips and tricks on how those challenges can be overcome. First, a little bit about myself. My name is Marco, I'm italian born, but I'm based in southern Spain. I've been working in thoughtworks for about three years. I like to call myself a technologist and a tinkerer and my true passion lies in animal rights activism. I've been working for the last three years in a consultancy company called Fluttworks. It's us based but it has offices all around the world. We've been building extraordinary impact. Together with our client worldwide, we try to amplify positive social change and revolutionize the tech industry. Across the years we created innovative products such as cruise control, which was the first continuous integration server. We created selenium. We invented the technology radar which you might be familiar with, and we keep pushing the envelope and try to make the technology landscape better. Also, we have technical expertise that is showcased by the sheer amount of technical books we publish in orally and in other providers. So we like to say whatever topic, technical topic you can think of, we have written the book on it. The agenda for today will be first we'll delve into the unique challenges that are specific to platform teams and to platform products. Then we're going to spend some time discussing how these challenges can be overcome and we'll conclude with a couple of zoom ins around practices and products. Starting with the challenges group is definitely around ux, so we can highlight five categories. Centricity is the first one. Developers need to be put at the center all new product development. We need to bear in mind what their needs are. We need to understand what their requirements and aspirations are. We then need to be able to provide them with a consistent experience across different products and services so that there's a seamless experience for them and they don't even have to notice when they are using one service or another. A common problem writing software or writing products for developers is their traditional lack of feedback. Developers are quite eager to get feedback on their work, but whenever a tool doesn't fit their objectives, just gonna drop it and look for another one. Categories that are tied together are simplicity and usability, and we should aim for both, meaning that a product should be both simple and usable. But whenever there's a trade off between the two, we should always favor usability. So we don't want a product to always have the most basic functionality. So being simple, but we want it to be easy to use and provide an additional range of functionality that can cater to advanced users. The next set of challenges is around business. Here we can think of the common lack of insights platform teams have into the long term organizational strategy and even into the roadmaps of individual products. When this happens, it becomes harder for platform teams to be ready with the functionality product teams need in time for when they need them and to make sure that they are battle tested. There can be a common distrust of platform technologies, especially from senior stakeholders. When this happens, it becomes harder to embrace cloud native technologies, to move to the cloud, and to embark on innovative initiatives such as the cloud carbon footprint measurement, a misconception about what the ideal operating model for platform teams should be. And here we can think of platform teams that are thought as support teams or just CI CD squads that only focus on helping development or product teams when they get stuck. The next set of challenges revolves around technology. Here we can think of the high cognitive load platform teams are subject to. All the CI CD monitoring, instrumentation database tools that are required for the organization to function, need somebody be well versed in, and most often these people are in order to be able to support the product teams that use this technology, platform teams need to work with a number of different tools at the same time, and this also requires a lot of context switching for them. Then we can think that platform teams should really focus mostly on their intended roadmap. They should have their own product backlog and they should make constant improvements to make sure that development teams get the functionality they need. But most often they get interrupted and they need to firefight and help product teams with urgent escalations. On top of this, there can be additional workloads coming from business requests and they can become the point of contact for third party vendors. Still time from their work on the internal roadmap and platform teams can feel a sense of disconnection because they are at least two levels apart from the business, so they might not feel that they are contributing that much into the overall, there's a set of challenges that include working with many different stacks, solution and vendors. Here we can think of platform teams having to interact with product teams using different tech stacks such as Java, Kotlin.net, Comma, Golang, Python and being able to support them. Then there's the presence of many different vendors which need to interact with and there's always a platform team that is responsible for the interaction between these vendors and the product teams needing their services. Using so many different technologies means that platform teams need to spend a significant amount of their time keeping up to date with this changing landscape. Not doing so can be a risk in terms of security, can be a risk in terms of missed opportunity if we are not able to use the new features whenever there's an upgrade. New services can also mean that we might be using something that is not enterprise grade or does not mean the required SLA and Slos for our company. So platform teams might need to step in in those circumstances and being able to cover the slack for those products that have been chosen and are not up to par. Finally, there's a risk of a vendor lock in. So our technology choice was made some years ago, some time ago, and then there's no way out of it. So essentially the company is stuck using these providers and there's no alternative in the market or there's no reasonable path out of it. How can we overcome these challenges? Starting with the devex ones, we can think of having research, segmentation and experimentation in place. Start by building the smallest possible feature that can fulfill the requirements from the product teams and then from there iterate. This is akin to having an agile mindset applied to the platform teams. We then can move on to have concepts from experience design, meaning that we should profile and understand. Our internal users should have Personas built around them and we should understand their goal, aspiration and needs, and we should base our future development based on the outcome of this experience design. We can then have an internal developer portal which would allow all documentation to be put in one place and then would allow developers to quickly create new repository and service and have a quick way to deploy those new services into production quickly. Platform teams will have to spend less time with their product teams and this can be accomplished by having comprehensive documentation that details possible problems and how they they can be solved. Finally, we can think of eliciting continuous feedback from the development teams, for instance, scoring how the help was whenever platform teams came to their help, and then also asking for feedback on how things could be improved. Moving on to the business challenges something that platform teams can do is to try to understand better the challenges of the organization and what the ultimate goals are. This can translate, for instance, to platform teams attending business hold hands, asking questions, trying to have additional sessions with the platform with the product teams. Next, platform teams should make sure that the internal platform roadmap is aligned with a roadmap of the product they serve. This does not mean only in terms of functionality, but also in terms of timeline and completeness of those features. We don't need to have something 100% ready from day one, but we can the functionality that is required from the beginning and then iterate on it. We need to provide a strong foundation that then the product teams can build upon. And this foundation should revolve around practices such as having a good governance in place, having security by default, SRE practices, and launch innovative initiatives such as phenops and carbon footprint reduction. Next, we can look at the set of challenges that are specific to technology. It's important to build a platform product as an ecosystem where different solutions integrate different solution, even coming from different vendors, should be one single product by the internal users. So we should be able to provide a seamless experience across these different products and services to our internal when we decide to use a product, a technology or a service, we need to have well established evaluation techniques in place, meaning that every time we go on the market or look for an open source software to solve a specific business or technological need, those evaluation techniques should be in place and should be stable and should be approved by the organization at large. We then need to treat the platform products their own life cycles, meaning that we need to establish specific criteria for when a new product should be created, when it should be maintained and when it's time to retire it. We should opt to vendor agnostic technologies whenever possible. This could mean, for instance, to use terraform rather than cloud formation rather than biceps, because terraform are specific cloud providers, but the same technology can then be used for other purposes such as terraform code for the Octa configuration, for instance. We need to be able to cultivate the relationships internally and externally and showcase the value delivered. We need to make sure that all our internal and external stakeholders understand the value the platform teams are providing and making sure that the functionality offered is improving over time. It's then important to allocate some time to choose the right toolset and technology to try them out and set on the best one, not picking the first one in a rush. The evaluation needs to reflect metrics such as enterprise grade, SLA's extensibility, how much it's used in the industry, and even the reputation of the producer of these tools. We need to make sure that the platform roadmap is tied to the organization and the business strategy so that the goals are aligned and we are seen contributing to the overall strat, the overall success of the organization at large. We should strive to make sure that development teams are able to work independently and they don't need handholding from the platform teams. This can be achieved by self service portal having very effective onboarding here. For instance, the North Star for the onboarding could be making sure that when a developer joins their team, they can commit and deploy to production in one day. This is very aspirational from an organization, but it's something a company should work toward, providing playgrounds so that developers can try out new things without risking an accident in production. Next, we can have a couple of zoom ins about practices and product thinking. Platform teams implement agile first and foremost, it's important to have a clear definition of the client, most often the client of a platform team and internal development teams. It's important to understand the differences between the internal development teams, what kind of need and aspiration they have, what kind of stack they use and what drives them. It's then important to work in new features with an MVP mindset, meaning that we should be able to deliver a product that fulfills only the strong requirements that are asked for by the development teams, and we can then iterate and prioritize subsequent improvements. We'll most likely have competing priorities between different services and different teams, so we need to be able to balance providing additional functionality with a timely delivery. It's also important to measure success. How can this be measured success for platform teams? Development teams are happy with the work we've been doing, thinking about something more tangible, that they are able to go to production quicker, that they have fewer accidents in production, that the onboarding is smoother. So we have both qualitative and quantitative measures for this next platform teams should have a clear definition of their software development life cycles. We can start with testing, meaning that platform teams should have a testing mindset and should apply the same practices that are often used in development teams around testing. It shouldn't be an excuse that infrastructure is harder to test, as many technologies such as terraform have comprehensive frameworks that allow for unit integration and end to end tests. So we can have a pyramid of tests in the same way we have for development code bases. Another strategy that can be used to have a better delivery is using feature toggles. This will allow us to deploy new functionality to production from day one without the risk of breaking existing things. This will allow us to also showcase new functionality to specific users or specific roles. It's important for platform teams to provide abstractions to the development teams that use that are provided by, but it's also important to have the right level of abstraction in place. In software development, there's a concept of which means that our abstraction is leaking concept from a concept, from the underlying layer. This makes it very harder for the user of our abstraction component kind of concepts in mind when they're using our library. So hitting the right level of abstraction is notoriously hard, but it's also extremely important and having a clear definition of done, it's extremely necessary for platform teams. So it's important to understand when a new feature is completed. Is it completed when the code is pushed to master, is it completed when it's deployed, is it completed when a new library version is released, or is it completed when all the internal development teams are using the new version? You need to be able to measure who's using which versions and provide incentive to update to the latest version. Finally, we can think of having the ideas from team topologies applied so platform teams can act as enablers at times. So they should be able to come in and help development teams adopt the best practices such as monitoring effective deployment instrumentation, having an effective kubernetes deployment, for instance, whenever a development team is not able to do it on their own. The important part here is that it should be done on a case by case basis, and it should be done with the goal of making sure that the development team can be self sufficient in those respects. After an agreed upon timeline, it's important that the platform teams specialize and focus on providing core building blocks that then the product teams can reuse and customize according to their needs. They shouldn't be providing the ultimate we can think of having domain driven design also in platform domains, so having this segregation in place will allow us to have a better definition of what each platform team should be responsible for and avoid the painful and costly context switching that oftentimes happen in platform teams. We should aim at having independent dev streams where the development teams are able to ship new features without intervention from the platform team, and they should be able to solve production issues and improve their internal codebases and deployments without need of intervention from the platform teams. We can also apply concepts from product or portfolio management to platform teams. So whenever we decide to build an abstraction on top of existing products, especially third party products, we need to make sure that the abstraction makes sense and that is actually providing a value. If not, we should maybe reconsider using the underlying direct implementation that most likely is already well tested and documented. We need to be able also to think about how we can abstract from organization, process and rules. So if you can provide a strong default on the configuration that adheres to the organization rules, it will make the developer's life much, much easier. We can also think about treating the product and services of the platform team as part of a service portfolio. Each product and service will have a different maturity and will have a different lifecycle stage product metrics for the products of the platform team, so we should understand what their cost is, what level of adoption they have and extensively used they are across the organization, and how much financial sense they make. We should be able to define the ROI for these products from their inception, making sure how long it will take to recover from the investment and then this return doesn't materialize. We can decide if we want to pivot or if we want to change approach, or if you want to continue on the trajectory course having some enterprise foundation would mean providing some safety net to the developers, making sure that it's harder to misconfigure a service and potentially disclose sensitive information, or leaving a door ajar for potential attackers. So we should also evaluate the enterprise grade of our internal platform solutions, making sure that they comply with SLA and slos that are mandated by the company at large and the products provided by the platform teams should be integrated in a seamless ecosystem, making sure that the development experience is consistent. So thank you very much for attending. Today. We covered a lot of ground. We talked about the most common challenges that platform team face on their day to day operations, and then we analyzed how those challenges can be addressed and even how platform team can shift their mindset into a more agile and product driven mindset. Thank you very much for attending.
...

Marco Pierobon

Lead Developer @ Thoughtworks

Marco Pierobon's LinkedIn account Marco Pierobon's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways