• Validate, evolve, and control schemas in Amazon MSK and Amazon Kinesis Data Streams with AWS Glue Schema Registry
    by Brian Likosar on January 13, 2021 at 9:23 pm

    Data streaming technologies like Apache Kafka and Amazon Kinesis Data Streams capture and distribute data generated by thousands or millions of applications, websites, or machines. These technologies serve as a highly available transport layer that decouples the data-producing applications from data processors. However, the sheer number of applications producing, processing, routing, and consuming data can

  • Securing access to EMR clusters using AWS Systems Manager
    by Sai Sriparasa on January 12, 2021 at 8:44 pm

    Organizations need to secure infrastructure when enabling access to engineers to build applications. Opening SSH inbound ports on instances to enable engineer access introduces the risk of a malicious entity running unauthorized commands. Using a Bastion host or jump server is a common approach used to allow engineer access to Amazon EMR cluster instances by

  • Building complex workflows with Amazon MWAA, AWS Step Functions, AWS Glue, and Amazon EMR
    by Dipankar Ghosal on January 11, 2021 at 7:37 pm

    Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a fully managed service that makes it easy to run open-source versions of Apache Airflow on AWS and build workflows to run your extract, transform, and load (ETL) jobs and data pipelines. You can use AWS Step Functions as a serverless function orchestrator to build scalable

  • Introducing Amazon EMR integration with Apache Ranger
    by Varun Rao Bhamidimarri on January 8, 2021 at 8:22 pm

    Data security is an important pillar in data governance. It includes authentication, authorization , encryption and audit. Amazon EMR enables you to set up and run clusters of Amazon Elastic Compute Cloud (Amazon EC2) instances with open-source big data applications like Apache Spark, Apache Hive, Apache Flink, and Presto. You may also want to set up multi-tenant EMR

  • Estimating scoring probabilities by preparing soccer matches data with AWS Glue DataBrew
    by Arash Rowshan on January 8, 2021 at 8:11 pm

    In soccer (or football outside of the US), players decide to take shots when they think they can score. But how do they make that determination vs. when to pass or dribble? In a fraction of a second, in motion, while chased from multiple directions by other professional athletes, they think about their distance from