Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, good morning, good afternoon, and good evening to all.
I'm Vanta.
I have 23 years of experience delivering projects in various
Dominic Healthcare, finance, oil and gas, worked in AWS and Azure Cloud
environments, and have extensive experience in big data frameworks
like Oop, spark, Kafka, et cetera.
Today I would like to share with you the experiences I have gone through.
In developing cross language libraries, we will travel through the architectural
patterns and methodologists follow while creating the Spark JMS Connected
project and their application to them.
Last ecosystem development.
Let's go to the agenda and see what are all the topics which are
covered as part of this presentation.
So first we'll cover the project background.
We'll see what is the need.
Of the Spark JMS connector.
What are all the goals and challenges we have faced while
developing this Spark JMS connector?
Then we will travel through the common design challenges which we face while
developing a trans language libraries.
Number three, we will see the abstractions and the interfaces of the both the JBM
provided interfaces and the registrate, and we'll next we will navigate to.
The error handling strategies where we will see the JBMs patterns and
the rust result based approach.
After that, we will see how we have covered the implementation with
the comprehensive testing across all the implementations, both with
the rust and the JBM and last, we will see how we have generated the
YouTube and discover discoverable interfaces with the documentation.
And a PA design.
We'll quickly go through the project background.
Yeah.
Capital One, spark JMS Connector is created because the source
is a message queue and we have different message sources with the
different types of message queues.
You can see some JMS providers provide the message in active queue,
some providing IBM and queue and some providing SOS number two.
This JMS connector should support the real time processing because we
are going to read the financial data and we need to quickly respond to the
end customer as soon as we receive them financial transaction from him.
Number three, this has to be the enterprise data connector for
Apache Park because we are going to reuse this connector library across
all the departments in the cap.
Next.
The importance of it is it has to be critical and real time.
Real time data pipelines, and it has to be part of the fraud detection
and last, but the greatest, it has to be performance sensitive.
With strict reliability requirements, you cannot lose the data customer gets
annoyed with if you lose the data.
The other challenge, which we have seen is the input library.
The inputs which we are getting for this library have different message patterns.
We need to combine all these message patterns and make one unique
message schema and handle it to different broker for implementation.
With this, we have covered that departmental goals and the
challenges which we are going to face by developing this library.
Next, we will go to the Universal Design Challenges while creating a library.
A library always.
Should have abstraction boundaries because the implementation may change
and you cannot expect the user of this library to change his code because
your implementation has changed.
So there should be an extra abstraction between your implementation and
the customer implementation.
We have created the, we need to create the interfaces to hide the implementation
details for providing flexibility.
Number two, configuration manage.
You need to be able to configure the things and customize the behavior
of the library based, or you have, you should be able to enable or
disable some of the features based on based on the configurations.
Number three, error propagation.
Error propagation.
You should be able to communicate your your failures effectively, and also you,
you need to preserve the co context.
And the recovery options
and extensibility.
You should always leave the space for the future enhancements, but you should not
be that future enha enhancement should not come at the cost of breaking the backward
compatibility, performance constraints.
Yeah, you should minimize the overhead while maintaining the
safety and the correct carriage.
First pattern we have identified is the provider pattern.
The provider interface pattern we used in the JJMS connector, fortunately has
a natural unlock in the restate system.
So we were a, we were able to easily blend the JVM provider interface
with the respirate system, with the as rest rate provides the zero cost
abstractions with the compiled time polymorphism and the JBM site provider
interface is what we have done is.
Is we have implemented the dependency injection pattern
to swap the implementations based on the user selection.
Now, the next architectural pattern is separation of concepts.
We need to isolate the component.
We need component isolation patterns to transfer between the two ecosystems so
that those two ecosystems are cohesively connected from the JBM implementation.
What we have done is we have.
Isolated the message consumers from the connection management message.
Consumers don't know from where the message came from and from
which message queue from the message come from, come, came from.
In the same way, acknowledgement strategies are also separated
from the message processing.
So message processing doesn't know how to acknowledge it, is separated between the
connection and the processing connection takes care of the acknowledgement.
And message processing does the processing work.
Only
Number three, error handlers are decoupled from the core business logic.
So they are used as a plug and play.
The error handlers can be added, are remote based on the requirement
configuration validation.
We have separated it from that usage now from the rust side.
Rust side, there are builtin things, module system, which
naturally enforces the boundaries.
And though there is a ownership model which clarifies the responsibility for the
resources, and as I said earlier, trade objects for run temporal when needed,
and we did the type parameter checking for the compiled plus borrow checker
also enforces the clean separation, prevents the leaky abstractions.
If there are.
Okay, now let's come to the error handling strategies.
So what we need to have is separate approaches for both rush transition
and the JBM for the JBM approach.
What we have done is, as it is an object oriented and relies on hierarchy
inheritance, we wanted to follow the sale.
What we have done is we have created check and uncheck exception.
Exception throws and catches everywhere for the errors to propagate.
Okay.
And we we e for every exception throne we emphasized on the detailed
messages and the stack tracer, we will be able to identify the exact
location of the issue and fix it.
And also, while creating the exceptions and catching the exceptions we had, we
didn't compromise on the performance.
We considered the implications of the performance because of.
Creating try because of the exception throwing and catching.
Now, from the rust translation point of view, we use the result, which is,
which encodes whether it is a success or failure, forcing the compound actually.
And
we use the custom arenas from, for cohesive comp and composable error
types, and for contextual training.
Training.
We used library anyhow, and this error to provide agon context, tradition, and
for pattern matching, we have exhaust to match the pattern, match statements to
ensure all the error cases are handled.
Now let's go to the comprehensive testing methodologies where we will
discuss what are the unit testing strategies and integration testing
strategies and property based testing strategies, which we have
implemented to cover the test case.
Yes.
So we, for JVM, we used Marketo for the interface mocking and
for rust we have used the mock implementations of the traits.
And for integration testing, we have used test containers for the broker instances.
In JVM in case of rust, we have used container based testing for
feature flags or feature flags.
And in case of property based testing, we have used WIC theories or Jake W for JB.
And for rust we have used prop test.
A quick check.
Now let's move on to the configuration management pattern patterns like first
thing is JVM configuration approach.
We have used builder pattern for the sensible defaults with the
sensible defaults, and we have used immutable configuration objects and
we validated at the construction time.
And the hierarchical configuration with the over rates.
So if you see this actually, so we have used the builder pattern to, for
with the sensibility fault for the JJBM configurations, rust translation.
Again, like in the last case, we have used the builder pattern with the default rate.
It goes to the default rate if it doesn't find any implementation
in case of builder pattern.
We use a type safe confi configuration with the compiled and validation,
and we, for static configurations, we used constant generics and we
used con config structures with the validate validation functions.
Both a, both these approach approaches emphasize the type
safety 10 validation before use.
But the advantage of rush over the j or rush over the JVM is it can push
more validation to the compiled type.
Yeah.
Next let's go to the performance considerations.
So what we have achieved after the different optimization approaches
that achieved similar goals when we have taken care of batch pro batch
processing, and JJVM JVM optimization, connection, pooling, and reuse, instead
of creating the connections every time to keep, to reduce the pressure on.
GC and the careful memory management to reduce careful and reuse of the
memory management to reduce the G GC pressure and jet friendly core
patterns we have used to reduce the memory and from the rust optimization.
What we have used is we used mono modernization where it
is zero cost abstractions.
We used explicit memory management with the lifetimes, and we used
compiled time evaluation when possible.
And we used ownership model, which achieved a fearless concurrency.
Now
with this, we have achieved the performance improvement of 10 times and
with zero runtime overhead, and we have H of 99.9% of the liability target.
Now let's move on to the documentation and API design.
We have used a LA language.
First of all, like during the coding, we have used a consistent naming conventions.
For example, for JVM, we used Camel case for J Java variables
and snake case for rust variables.
And we have established a clear domain specific technology.
Because that remains consistent throughout the a PA. We made simple use cases.
We broke the complex use cases into simple use case.
So that it'll be easy to implement.
They are advanced features are available, but they are not required for the basic.
So both ecosystems benefit from the tired APIs with increased complexity.
Both.
We have used both OC and Java DOC to support the embedded examples.
Our JVM connector provided example classes for the users to use it in stuff.
Document, it's worked better than the documentation part and rust documentation.
Also, we have used the dock tests that are automatically verified.
Yeah, we have explicitly documented the errors, which you can throw and
the what are the errors which will be thrown by the library, and what
are the recovery strategies interest.
That means we have documented all the error variants that can
be returned from each functions.
What are the key learnings from this library implementation?
The first thing is abstraction.
Principles are universal for all the languages.
Good interface design transmits the languages.
We can identify the right abstraction boundaries regardless
of the language implementation.
Number two, you leverage the strengths of each language.
Plus ownership model and trade system provides the compelling
guarantees that required runtime check in the JBM languages.
One of things and testing is la language agnostic.
You have that the test cases might be language language dependent test
case implementation might be language independent, but the testing strategies
are, and that are like what, what went between the ecosystems though?
Implementation details are different, as I said, and.
When we are implementing and the document intent, not just
implementation, while we are implementing the code, we have explained
why a design is made as important.
It's as important for the documentation and how to use an a PA so that while
generating the document, you'll get all the information on how to use the a p
and how, what design cha choices you have made regardless of the language.
Yeah.
Thank you.