The Canonical Data Fabric team is pleased to announce the first beta release of Charmed Spark, our solution for Apache Spark.
Apache Spark is a free, open source software framework for developing distributed, parallel processing jobs. It’s popular with data engineers and data scientists alike when building data pipelines for both batch and continuous data processing at scale. Engineers can write Python or Scala code to develop Spark jobs for ETL (extract-transform-load), analytics and machine learning.
Canonical is building a supported, packaged solution for running Spark jobs on Kubernetes. The preview release is the first milestone towards building a comprehensive solution for Spark users.
The beta release includes features for:
- Submitting jobs to the cluster
- Managing job configuration
- Security maintained container images
- A software operator to deploy and operate the Spark History Server
Charmed Spark is a part of Canonical Data Fabric, a set of solutions for data processing, with additional solutions to be announced.
Charmed Spack reference architecture
Users can deploy Charmed Spark to MicroK8s, Charmed Kubernetes and AWS Elastic Kubernetes Service (EKS). Read the reference architecture guide:
Share your feedback
At Canonical, we always value the community’s feedback about our products. We would like to ask you to try out Canonical’s Charmed Spark and send us your comments, bug reports and general feedback so we can include them in our future releases.
To get started, head over to the Charmed Spark documentation pages and install the spark-client snap.
Chat with us at https://chat.charmhub.io/charmhub/channels/data-platform or file bug reports and feature requests in Github.