Loading…
DevConf.cz 2022 has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

HPC & Big Data & Data Science [clear filter]
Friday, January 28
 

6:00pm CET

Building data pipelines for Anomaly Detection
Cloud-native applications. Multiple Cloud providers. Hybrid Cloud. 1000s of VMs and containers. Complex network policies. Millions of connections and requests in any given time window. This is the typical situation faced by a Security Operations Control (SOC) Analyst every single day. In this talk, the speaker talks about the high-availability and highly scalable data pipelines that he built for the following use cases :

- Denial of Service: A device in the network stops working.
- Data Loss : An example is a rogue agent in the network transmitting IP data outside the network
- Data Corruption : A device starts sending erroneous data.

The above can be solved through anomaly detection models. The main challenge here is the data engineering pipeline. With almost 7 Billion events occurring every day, processing and storing that for further analysis is a significant challenge. The machine learning models (for anomaly detection) has to be updated every few hours and requires the pipeline to create the feature store in a significantly small time window.

The core components of the data engineering pipeline are:
- Apache Zookeeper
- Apache Kafka
- Apache Flink
- Apache Pinot
- Apche Spark
- Apache Superset

The event logs are stored in Pinot through Kafka topic. Pinot supports apache kafka based indexing service for realtime data ingestion. Pinot has primitive capabilities to create sliding time window statistics. More complex real-time statistics are computed using Flink. Apache Flink is a stream-processing engine and provides high throughput and low latency. Spark jobs are used for batch processing. Superset is used as BI tool for realtime visualization.

The speaker talks through the architectural decisions and shows how to build a modern real-time stream processing data engineering pipeline using the above tools.

Outline
  • The problem: overview

  • Architecture

  • Real-Time Processing

  • Anomaly Detection

  • Visualization

  • Demo



Session chairs: Andrei Veselov and Pavel Yadlouski

Speakers
avatar for Tuhin Sharma

Tuhin Sharma

Senior Principal Data Scientist, Red Hat
Tuhin Sharma is Senior Principal Data Scientist at Redhat in the Corporate Development and Strategy group. Prior that he worked at Hypersonix as AI Architect. He also co-founded and has been CEO of Binaize, a website conversion intelligence product for Shopify. He received master’s... Read More →


Friday January 28, 2022 6:00pm - 6:50pm CET
Session Room 1
 
Saturday, January 29
 

11:30am CET

Preconditioners to scale Multi-physics Simulations
Preconditioners (PCs) are used to improve both, the efficiency and robustness of iterative techniques while solving very large linear systems on a Krylov subspace. However, determining with preconditioner to use with which equations or set of equations on a certain multi-physic simulation requires a combination of knowledge of preconditioning, matrices techniques, types of matrices, Krylov subspaces, iterative methods, among other Linear Algebra's foundation. The present work provides a benchmark of the most popular preconditioners available today, emphasising their respective performance in terms of time to solution of the Finite Element problem, usage of memory, number of iterations, the value of |R| achieved when converged. The performance evaluation is made for the Compute Finite Strain Elastic Stress in 3D, using the University of Cambridge Research Computing Service (CDS3) and the Message Passing Interface (MPI) implementations that allows parallelisation. The benchmark and scaling was done with MOOSE which use the Finite Elements Method and million Degrees of Freedom (DoF). Along with the preconditioners and KSP types, a variety of options were tested to optimise its performance.

Session chairs: Justin Nixon and Michal Ruprich

Speakers
avatar for Julita Inca

Julita Inca

HPC Software Specialist, UKAEA
Education:- Systems Engineering in Peru, Callao's university.- Computer Science Masters in Peru, PUCP's university.- High Performance Computing Masters in the UK, Edinburgh's university.- Red Hat Certified Professional 140-100-496Latest Work Experiences:- Member of the GNOME Foundation... Read More →



Saturday January 29, 2022 11:30am - 12:20pm CET
Session Room 4

12:30pm CET

Build your own social media analytics with Apache
Apache Kafka is more than just a messaging broker. It has a rich ecosystem of different components. There are connectors for importing and exporting data, different stream processing libraries, schema registries and a lot more. The first part of this talk will explain the Apache Kafka ecosystem and how the different components can be used to load data from social networks and use stream processing and machine learning to analyze them. The second part will show a demo running on Kubernetes which will use Kafka Connect to load data from Twitter and analyze them using the Kafka Streams API. After this talk, the attendees should be able to better understand the full advantages of the Apache Kafka ecosystem especially with focus on Kafka Connect and Kafka Streams API. And they should be also able to use these components on top of Kubernetes.

Session chairs: Justin Nixon and Michal Ruprich

Speakers
avatar for Jakub Scholz

Jakub Scholz

Principal Software Engineer, Red Hat
Jakub is a Principal Software Engineer in the Messaging and IoT team. He has a long-term experience in messaging and lately focuses mainly on Apache Kafka. He is one of the core maintainers of the Strimzi project, which delivers several operators and tools for running Apache Kafka... Read More →



Saturday January 29, 2022 12:30pm - 1:20pm CET
Session Room 4
 
  • Timezone
  • Filter By Date DevConf.cz 2022 Jan 28 -29, 2022
  • Filter By Venue hopin.to
  • Filter By Type
  • Analysis &Testing & Automation
  • Capture the Flag
  • Cloud & Hyperscale
  • Edge Computing
  • Future Tech & Open Research
  • HPC & Big Data & Data Science
  • Linux distribution
  • Meetup
  • Modern Software Development
  • Open Source Education
  • Open Source UX/Design
  • Workshop