Cross-Language Library Design: Lessons from Building JVM Data Connectors That Translate to Rust Crate Development

Video size:

Abstract

Battle-tested JVM patterns → Rust crate design! Learn how Capital One’s production library architecture translates: provider interfaces→traits, error handling→Results. Bridge ecosystems & build better Rust libraries!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey, good morning, good afternoon, and good evening to all. I'm Vanta. I have 23 years of experience delivering projects in various Dominic Healthcare, finance, oil and gas, worked in AWS and Azure Cloud environments, and have extensive experience in big data frameworks like Oop, spark, Kafka, et cetera. Today I would like to share with you the experiences I have gone through. In developing cross language libraries, we will travel through the architectural patterns and methodologists follow while creating the Spark JMS Connected project and their application to them. Last ecosystem development. Let's go to the agenda and see what are all the topics which are covered as part of this presentation. So first we'll cover the project background. We'll see what is the need. Of the Spark JMS connector. What are all the goals and challenges we have faced while developing this Spark JMS connector? Then we will travel through the common design challenges which we face while developing a trans language libraries. Number three, we will see the abstractions and the interfaces of the both the JBM provided interfaces and the registrate, and we'll next we will navigate to. The error handling strategies where we will see the JBMs patterns and the rust result based approach. After that, we will see how we have covered the implementation with the comprehensive testing across all the implementations, both with the rust and the JBM and last, we will see how we have generated the YouTube and discover discoverable interfaces with the documentation. And a PA design. We'll quickly go through the project background. Yeah. Capital One, spark JMS Connector is created because the source is a message queue and we have different message sources with the different types of message queues. You can see some JMS providers provide the message in active queue, some providing IBM and queue and some providing SOS number two. This JMS connector should support the real time processing because we are going to read the financial data and we need to quickly respond to the end customer as soon as we receive them financial transaction from him. Number three, this has to be the enterprise data connector for Apache Park because we are going to reuse this connector library across all the departments in the cap. Next. The importance of it is it has to be critical and real time. Real time data pipelines, and it has to be part of the fraud detection and last, but the greatest, it has to be performance sensitive. With strict reliability requirements, you cannot lose the data customer gets annoyed with if you lose the data. The other challenge, which we have seen is the input library. The inputs which we are getting for this library have different message patterns. We need to combine all these message patterns and make one unique message schema and handle it to different broker for implementation. With this, we have covered that departmental goals and the challenges which we are going to face by developing this library. Next, we will go to the Universal Design Challenges while creating a library. A library always. Should have abstraction boundaries because the implementation may change and you cannot expect the user of this library to change his code because your implementation has changed. So there should be an extra abstraction between your implementation and the customer implementation. We have created the, we need to create the interfaces to hide the implementation details for providing flexibility. Number two, configuration manage. You need to be able to configure the things and customize the behavior of the library based, or you have, you should be able to enable or disable some of the features based on based on the configurations. Number three, error propagation. Error propagation. You should be able to communicate your your failures effectively, and also you, you need to preserve the co context. And the recovery options and extensibility. You should always leave the space for the future enhancements, but you should not be that future enha enhancement should not come at the cost of breaking the backward compatibility, performance constraints. Yeah, you should minimize the overhead while maintaining the safety and the correct carriage. First pattern we have identified is the provider pattern. The provider interface pattern we used in the JJMS connector, fortunately has a natural unlock in the restate system. So we were a, we were able to easily blend the JVM provider interface with the respirate system, with the as rest rate provides the zero cost abstractions with the compiled time polymorphism and the JBM site provider interface is what we have done is. Is we have implemented the dependency injection pattern to swap the implementations based on the user selection. Now, the next architectural pattern is separation of concepts. We need to isolate the component. We need component isolation patterns to transfer between the two ecosystems so that those two ecosystems are cohesively connected from the JBM implementation. What we have done is we have. Isolated the message consumers from the connection management message. Consumers don't know from where the message came from and from which message queue from the message come from, come, came from. In the same way, acknowledgement strategies are also separated from the message processing. So message processing doesn't know how to acknowledge it, is separated between the connection and the processing connection takes care of the acknowledgement. And message processing does the processing work. Only Number three, error handlers are decoupled from the core business logic. So they are used as a plug and play. The error handlers can be added, are remote based on the requirement configuration validation. We have separated it from that usage now from the rust side. Rust side, there are builtin things, module system, which naturally enforces the boundaries. And though there is a ownership model which clarifies the responsibility for the resources, and as I said earlier, trade objects for run temporal when needed, and we did the type parameter checking for the compiled plus borrow checker also enforces the clean separation, prevents the leaky abstractions. If there are. Okay, now let's come to the error handling strategies. So what we need to have is separate approaches for both rush transition and the JBM for the JBM approach. What we have done is, as it is an object oriented and relies on hierarchy inheritance, we wanted to follow the sale. What we have done is we have created check and uncheck exception. Exception throws and catches everywhere for the errors to propagate. Okay. And we we e for every exception throne we emphasized on the detailed messages and the stack tracer, we will be able to identify the exact location of the issue and fix it. And also, while creating the exceptions and catching the exceptions we had, we didn't compromise on the performance. We considered the implications of the performance because of. Creating try because of the exception throwing and catching. Now, from the rust translation point of view, we use the result, which is, which encodes whether it is a success or failure, forcing the compound actually. And we use the custom arenas from, for cohesive comp and composable error types, and for contextual training. Training. We used library anyhow, and this error to provide agon context, tradition, and for pattern matching, we have exhaust to match the pattern, match statements to ensure all the error cases are handled. Now let's go to the comprehensive testing methodologies where we will discuss what are the unit testing strategies and integration testing strategies and property based testing strategies, which we have implemented to cover the test case. Yes. So we, for JVM, we used Marketo for the interface mocking and for rust we have used the mock implementations of the traits. And for integration testing, we have used test containers for the broker instances. In JVM in case of rust, we have used container based testing for feature flags or feature flags. And in case of property based testing, we have used WIC theories or Jake W for JB. And for rust we have used prop test. A quick check. Now let's move on to the configuration management pattern patterns like first thing is JVM configuration approach. We have used builder pattern for the sensible defaults with the sensible defaults, and we have used immutable configuration objects and we validated at the construction time. And the hierarchical configuration with the over rates. So if you see this actually, so we have used the builder pattern to, for with the sensibility fault for the JJBM configurations, rust translation. Again, like in the last case, we have used the builder pattern with the default rate. It goes to the default rate if it doesn't find any implementation in case of builder pattern. We use a type safe confi configuration with the compiled and validation, and we, for static configurations, we used constant generics and we used con config structures with the validate validation functions. Both a, both these approach approaches emphasize the type safety 10 validation before use. But the advantage of rush over the j or rush over the JVM is it can push more validation to the compiled type. Yeah. Next let's go to the performance considerations. So what we have achieved after the different optimization approaches that achieved similar goals when we have taken care of batch pro batch processing, and JJVM JVM optimization, connection, pooling, and reuse, instead of creating the connections every time to keep, to reduce the pressure on. GC and the careful memory management to reduce careful and reuse of the memory management to reduce the G GC pressure and jet friendly core patterns we have used to reduce the memory and from the rust optimization. What we have used is we used mono modernization where it is zero cost abstractions. We used explicit memory management with the lifetimes, and we used compiled time evaluation when possible. And we used ownership model, which achieved a fearless concurrency. Now with this, we have achieved the performance improvement of 10 times and with zero runtime overhead, and we have H of 99.9% of the liability target. Now let's move on to the documentation and API design. We have used a LA language. First of all, like during the coding, we have used a consistent naming conventions. For example, for JVM, we used Camel case for J Java variables and snake case for rust variables. And we have established a clear domain specific technology. Because that remains consistent throughout the a PA. We made simple use cases. We broke the complex use cases into simple use case. So that it'll be easy to implement. They are advanced features are available, but they are not required for the basic. So both ecosystems benefit from the tired APIs with increased complexity. Both. We have used both OC and Java DOC to support the embedded examples. Our JVM connector provided example classes for the users to use it in stuff. Document, it's worked better than the documentation part and rust documentation. Also, we have used the dock tests that are automatically verified. Yeah, we have explicitly documented the errors, which you can throw and the what are the errors which will be thrown by the library, and what are the recovery strategies interest. That means we have documented all the error variants that can be returned from each functions. What are the key learnings from this library implementation? The first thing is abstraction. Principles are universal for all the languages. Good interface design transmits the languages. We can identify the right abstraction boundaries regardless of the language implementation. Number two, you leverage the strengths of each language. Plus ownership model and trade system provides the compelling guarantees that required runtime check in the JBM languages. One of things and testing is la language agnostic. You have that the test cases might be language language dependent test case implementation might be language independent, but the testing strategies are, and that are like what, what went between the ecosystems though? Implementation details are different, as I said, and. When we are implementing and the document intent, not just implementation, while we are implementing the code, we have explained why a design is made as important. It's as important for the documentation and how to use an a PA so that while generating the document, you'll get all the information on how to use the a p and how, what design cha choices you have made regardless of the language. Yeah. Thank you.

Slides

Download slides (PDF)

See all 27 talks at this event!

Conf42 Rustlang 2025 - Online

August 21 2025 - premiere 5PM GMT

Cross-Language Library Design: Lessons from Building JVM Data Connectors That Translate to Rust Crate Development

Video size:

Abstract

Summary

Transcript

Slides

Venkata Appalapuram

@ Ritepros Inc

Join the community!

Featured event

2026

2025

Info

Conf42 Rustlang 2025 - Online

August 21 2025 - premiere 5PM GMT

Cross-Language Library Design: Lessons from Building JVM Data Connectors That Translate to Rust Crate Development

Video size:

Abstract

Summary

Transcript

Slides

Venkata Appalapuram

@ Ritepros Inc

Join the community!