Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome. I'm really excited to be here. Today we
will go through how we build the platform's product to serve internal
and external customers with high sellers expectation and how we can use
chatops and GitHub bots to boost developer productivity
and services in parallel engineering and non engineering teams in mattermost.
So before do a deep dive we need to understand what is Mattermost as
some concepts are based on Mattermost and Mattermost is
an open source platform that provides secure collaboration for technical and
operational teams. And we'll go right now
through the mattermost cloud story and how we built a
SaaS platform. Back in the days we had an idea to offer
maternals as a SaaS product and built maternals cloud and we initiated
all the software development lifecycle. So right now
we'll go through this lifecycle of maternals cloud and identify
in each phase actually what were the details,
what was happening around that and the decision made.
First we start with a planning try to identify what
is the current team's capacity. Do we need more resources
higher? Do we have the right skill set? Is the team structure the
current team structure enough in the responsibilities? What will be the cost
of the development and running and operating such a platform
and the milestones to launch Matrix cloud?
At that time we had adopted the devs and tops collaboration
model with each team specializing where needed, but also same were
necessary. So there was collaboration between them
and then we passed to the next phase which were the requirements. In this phase,
actually we did a deep dive on to understand what is the
ideal customer profile, who's going to use our product, the features,
the user flows, the slos and the slas. We need to support our
SLA agreements with the customers that we need an incident response that we
need to focus on scalability and performance to host thousands
of workspaces and customers from mattermost
and security and compliance. At that time we realized that
we need to introduce an SRE team and the collaboration
between the dev and SRE happens around operational criteria. So when the
SRE team is happy with the code and the production
and readiness reviews have been approved, then they are able to support it
in production. So we introduced four new teams
based on the requirements and the planning we had. Initially, SRM will be
responsible to build and operate a reliable, scalable, secure and
cost efficient SaaS platform. Automation and tooling
will be actually the team to build the automation, the tools to orchestrate
and manage the fleet of the customer workspaces.
Delivery will be responsible in terms of the CI, CD and all the release
lifecycle and the release cadence for
our cloud customers. And we came with the strategy,
okay, we're going to build a SaaS platform which is going to be used by
customers. They're going to host their maternals workspace in
our managed infrastructure. So we moved to the
design phase and we tried to identify a couple of things there. What will be
the technology stack, the core components, the system architecture,
security controls, the compliance and data stations, the testing.
So a couple of things, and not only this, this is just a high level
and some examples of some decisions were already in AWS
and we decided that we need to use a mana database, RDS Aurora we
decided that we want to use Kubernetes to host easily thousands of
mattermost workspace and customers operate to simplify
the deployment of mattermost workspace customers services on the customer
portal to interact with the customers, the provisioner and the fleet controller, which we're
going to discuss in a few minutes.
So we came up with this high level architecture. So you can see
this is a wholesale platform. And what you can see here is that we have
a command and control kubernetes cluster and a couple of worker kubernetes cluster
which bots actually the customer workspaces. And the
flow starts actually with customers visiting the front end, which is a customer portal,
interact with the customer services, the customer portal and instruct
provisioner actually, which is a central point of management for mattermost
cloud resources to dictate and
command operator in the wartier Kubernetes cluster
to create a mattermost workspace for a specific customer.
And we see also the flip controller which is actually a service possible for
tracking matter most workspaces and making tweaks such
as hibernating or deleting inactive workspaces. So it's
mostly about housekeeping. So as
we discussed few minutes before, we identified also the testing strategy and
we realized that to test the SAS platform at scale, it will need bots
of effort and will be really complex and will be hard automate.
Everything will be cost and efficient because we'll run multiple
workspaces and have daily user and so on,
and will increase the time to market. That's really important for
us to launch mattermost cloud fast. So one of our core
principles in mattermost is to actually use our own product
to complete our mission. And this is actually what
we call, it's doc fooding. Right? So simply explain is
that you will be able to understand exactly
what it's like to use your product and the way that your users do
actually if you use it right. And this is the time
where our motto as a group has been created,
built once and for all. And this
is where actually also defined our mission as a group. The infrastructure
group's mission empowered matter was to provide the SaaS platform as a
product which helps internal and external users by guaranteeing
that we operate an enterprise grade building, building, building, building a
platform with selfserve powers, actually the mission of the group.
So we find our testing strategy and we decided that
we should not over automate and we need to prioritize only the customer
experience and some of the critical paths for automation around
integration. End to end testing and use a
SaaS platform as a golden path to run a dog food.
Actually the platform with testing devs ends, the support
ends and the demo ends. And we will see how we can
do that. So the new strategy actually was to offer the
SaaS platform to two customers, actually the external customers that
were before, and the internal customers,
developers, sales, pH support, Devrel. And let's
think that the internal customers, actually the early adopters, right, they are getting the
latest and the greatest in the whole platform fast.
So we can test whatever we deliver, all the changes we do in
actually a real to identical actually
platform in a test environment.
So the new high level architecture
has been changed a bit. So actually right
now we have the same stuff, but you will see that we will include a
few more things. So we have right now the customer control plane, which is
about the customers interact with the customer portal and in
parallel we have the developers control plane, as you can see
in the left side where engineers interact with GitHub and Mattermost and
with an internal developer platform layer, as you see.
And apart from this we have the non developers control plane where non engineers,
non developers actually use mattermost and interact with the internal developer
platform and they enable all of them to
use such a platform, which is a test
environment and run their own workspaces in
our clusters. So let's start with our developers control plane goals.
We wanted to give them the self services capabilities to be
able to do this by themselves and run test environments
and dev environments actually in the platform. We wanted to
support for staff and for open source contributors. We are an open source
company and we inspect the open source
community a lot. The developer experience
should happen only with the tools we use daily actually. And we
should not introduce something new to increase the cognitive
learning chair. And we wanted to make the platform and
the interaction as much a subject as we can,
actually non developers control plane I
would say there are not much differences there. I would say it's
mostly difference in the sense that we support sales support in Devrel.
Still we have the surface self capabilities, still the user
experience should be with the tools we use daily and abstraction the
SaaS platform should be agnostic. So we wanted
to provide the seamless experience. And as we see there are some common things.
We have mattermost where we collaborate, we communicate and
we talk daily and we use it.
And we use it actually most of our day. So both
groups, developers and non developers, actually they use mattermost. But there
is one more thing which is GitHub, including that, and we want actually to
use the same interface as they had before. So we decided to use for
GitHub the GitHub bots and for Mattermost to use actually that
setups model which we're going to discuss in a few minutes. What is it?
GitHub bots GitHub bots or apps actually are used
to automate and improve workflows. It's a small services which interact
with GitHub webhook events. And for
that reason actually we created a GitHub bot
which is called Spinweek. And what it does actually observes
the GitHub labels in terms of the webhook event and
accordingly deploys mattermost service in mattermost cloud.
So we have different type of test services using the customer portal, through the
CWS, through the customer web services and using provisioner.
So the design level architecture is actually if we do a zoom
in in the IDP right, there is another cluster which is a couple of services
running. Then in this case we see Spinwick and interacts with GitHub. When something is
happening in GitHub, Spinwick actually listens the changes in terms of the labels
of the labels has been added or removed and interacts back
with GitHub to provide the
context back to your user, what exactly is happening, what is the
status and so on. And Spinnywick still actually
interacts with custom server to do what we discuss capability through the custom server
and the other one for provisional, both of them, they will do actually
a test environment, a test workspace in our test worker Kubernetes
clusters. And let's see the self serve
capabilities of Spinweak.
And you can see here there is a pull request which has been
raised right now we want to test in the cloud test server. You can see
in the right side where the arrow actually is placed that
we have a label set up cloud test server or set up hack
cloud test services or set up cloud and CWS test
server actually, all these are actually triggering the flow
we discussed before in the high level architecture where
a cloud server is created. And let's see actually how
the whole interaction happens. So you can see here that I
just added a level which is called setup cloud
test server. So for the experience,
right, and to make clear to the user that something is happening,
Spinnyweeks is replying back with a comment to GitHub and says
that right now we're creating a new Spinnyweek test server using maternals cloud.
A few minutes after, actually 1 minute after,
you will see another comment by Spinweek which says that the mosque services
has been created successfully. Here is the access link and
huge delights the name, the pull
request numbers, and here are the credentials which are common
for everyone to use in
order to be able to log in in your mattermost test server.
But it's not only about creation, of course, it's about also removing
the environment. So if I remove the label, it's going to destroy the
server. If I melt or close the pr, the test server
also is going to be destroyed. And that's
the good thing. Actually, the good part we
won't discuss right now that why you need to have a common SaaS
platform and set up both customers is that
we discussed a few minutes in the high level architecture about the fleet controller.
And let's say that someone has a pull
request which is running for
many days. You have created a test server and
it's there. So imagine that if we have multiple polyquests
which are actually in that state, then we will have
multiple test servers just sitting there and not doing anything.
So the fleet controller is responsible for the housekeeping as
we do with the customer experience,
the same thing. So if there is an activity after a few days,
go to the hibernate state, and if after this hibernate state
there is no activity, it's going to be deleted. So the housekeeping is exactly
the same as we offer to our customers, right when
they do trials. This make the engineers
a bit more hungry and they want to automate a few more things.
So we created another bots which is a thing
which used to automate workflows in GitHub, again using labels,
using slash commands in comments for a pull request,
adding labels for housekeeping into issues in prs and
let's see some examples. So again, it's a self serve and we use the
slash common when someone is coming to mattermost as
an open source contributor and wants actually to contribute to Mattermost, they need
to sign their mattermost contributor agreement and you see
on the left side and the left image that someone has
been raised a pull request and got a comment back which says that you
need to sign actually the mattermost agreement. And once you have signed,
just run slash check CLA to confirm that CLA
is okay and green and you can see that someone has
signed and runs after this in a comment check CLA
and automatically you will see on the right
side that there is a status check which says CLA matter mode is green.
So mother mode has another
option which is update branch. We can just update a pull request with the
latest branch which is targeted to merge. For example, if I'm targeting
main in my pull request, if I run Slash update branch,
I'm going to actually merge the latest changes from main.
And you can see the example here that someone
wrote actually a comment update
branch and this automatically mattermost mode
actually mert all the changes which are included in the main
branch. There are a few other things. There is also
a few other common terry pick which is very handy for us in order to
use releases. So we want pull requests to be
terrific in other multiple branches for bug fixes or
for improvements. We have also slash commands to run end to end tests
which are running in the SAS platform, which is another thing we try to do.
So we have the same SAS platform to run the end to end tests,
and there is an end to end cancel which cancels the end to end
test if something is really slow. There's also the housekeeping
part we discussed. If a pull request
has an activity for a
specific duration actually
automatically, Mattermost will see this pull request and will label
them as state. So it will be easier for us to go through this pull
request and see if something we need to add or something we need to check.
So we discussed few minutes before that, one of our great
mattermost was chatops. Chatops is a collaboration
model that connects people, tools, process and automation into
a transparent workflow. And just to understand what
exactly we talk about, if we're in a communication
platform like Mattermost, right, a collaboration platform without
satos, we need to communicate inside for all the things we do together, even if
we are engineers or non engineers, and use something else to
trigger a workflow or platform tool. So we
tried to make this similarly with chatops. The engineers
and non engineers in a way communicate in chat, and mattermost
can interact with bots and send common to
initiate workflows in the platform tools. So everything happens similarly
in the platform URL, especially for the non engineers.
The non developers control plane is really important as it's the number
one thing they use daily setups with Mattermost.
Actually, we offer a bunch of options to the functionality and customized experience with satos.
There is the slash commands and the plugins. And for our
chase, actually, we built the cloud plugin. The cloud bot
allows the creation and the management of a test end in the SAS platform
directly from mattermost using slash formats from
any channel. Right? You can do this from any channel you are,
or the DM to the bot,
actually, and just a small example, let's say
that we are going to create a conf fourty
two. The cloud create test configurable 42
is the one in which we need to actually to write down in
mattermost, and this will return back to us that installation
has been initiated. You will see an application when it's ready, and you can
see the status of all cloud installation cloud
list. And when everything is ready, the cloudboard will
reply back to you with a DM that then the workspace
has been created that SN is there. This is the access URL actually,
and credentials we can use. And you can see something extra,
which is the part we didn't discuss in
this presentation about the monitoring control plane, where we can offer
a monitoring control plane for everyone, where they can see their
workspace logs and the provisional locks in case if something went
wrong. So we didn't want to stop there.
So we wanted also to offer the capability to be able to configure
and compose your own environment so you can do
a couple of things with a slash cloud. So we're creating another one which
is called Cloudwick and has a specific license. It can generate
some test data. It uses a specific size in terms of resources for
the high availability and a specific file. This gives us an ability
for the support team, for example, or for the sales team,
or for the Devrel team to create different kind of
environments based on the users they want
to make and the use cases they have in the scenarios, right?
So right now, if we go back to
the architecture and focus at least on the part which is the IDP,
right, and how all the control planes are working together,
we still have the control plane, the customer control plane,
where the customers interact and they create their own workspace,
their managed workspace, and the
mattermost workspace. And from the left side we have the two more
control planes, which is the developers control plane and the non developers control
plane, which they use both as an interface mattermost,
and engineers only GitHub.
All of them interact with IDP right seamlessly without knowing
exactly what's happening underneath. And right now we can see, right, that IDP
consists of multiple many services. The Spain
which cut before cloudboard, Mattermost mode,
motherboard is another one. We have a couple of other actually bots
for doing the same thing in small services with GitHub and Mattermost.
And this is actually our internal developer platform story and
the story which relied actually how we built once
and for all, right? For both world's customer and
the internal organization.
Learnings, the flexibility and the
reusability of the same track give us much more confidence to be sure that
what we deliver daily and the changes we had during the whole software development
lifecycle was stable,
was able to perform, identify fast bugs of the issue,
even identify non healthy mattermost
workspaces of the test ends. Because we had the same
mentality of the slos, even in the test infrastructure
and the test trust platform, it was really important
for us and was a good learning that the developer and user experience needs
to be seamless instead of creating
something new and something which is out of
the daily usage of the people that
are going to interact with your platform, run surveys, gather feedback
and listen to your internal and external customers is really important thing. This is how
actually the platform becomes better. And the last
thing which we mentioned also is use your own product.
This is the only way to identify if what you have built
is something actually meaningful for the customers
and if it fits to
your needs and it's something which you can use,
probably customers also can use. Of course this needs
other inputs in terms of feedback, slos and other things.
So that's it. Thanks a lot, have a great rest
of the day and have a great conference.