Dan Wolfson

2023-2025 Distinguished Visitor
Share this on:

Dan is a founder of Pragmatic Data Research Ltd. – a consultancy specializing in accelerating digital transformations through innovative data architectures and governance.  Dan is a contributor to the Linux AI & Data Foundation’s Egeria Project. Egeria supports an open metadata eco-system enabling the exchange and use of metadata across a broad variety of use cases from self-service analytics to data governance in a heterogeneous, federated and distributed environment.

Dan retired from IBM as a Distinguished Engineer and Director/CTO in the Weather Business Solutions group of IBM AI Applications in Q4 2021 where he led the application of geospatial data and analytics to multiple areas such as environmental intelligence, agriculture and utilities. He was responsible for technical leadership, product development and operations in this rapidly growing area with particular interest in the delivery of insight into analytical and operational solutions and the curation and management of multi-petabyte systems.  As an experienced technical executive and thought leader, Dan focuses on creating and implementing pragmatic systems that create new value from information. 

A proven CTO and Development Director, Dan has combined technology development, team development and operations to build and deliver numerous ground-breaking products and technologies. Dan’s last project before retiring from IBM was to lead the collaboration between IBM Research and Development teams to shape, implement and operate the IBM Environmental Intelligence Suite to provide customers insights on how Weather, Climate and Geospatial analytics can enhance their sustainability, operations and planning. Previously, Dan was the CTO for the IBM InfoSphere Brand covering data integration, governance, master, reference and meta data.  

Dan has over 40 years of experience in research and commercial distributed computing ranging over transaction and object-oriented systems, software fault tolerance, messaging, information integration, business integration, metadata management and database systems. Dan has served as a trusted advisor to customers on data architecture, information and governance strategies across Financial Services, Manufacturing, Agriculture, and retail. A co-author of numerous papers, patents, and books, Dan is also credited with over 60 patents. Appointed an IBM Distinguished Engineer in 2003, the Association for Computing Machinery recognized Dan in 2010 as an ACM Distinguished Engineer.




DVP term expires December 2025


Using Open Source to Accelerate Sustainability Initiatives

Addressing climate change and sustainability is one of the most urgent concerns we face. A key step in addressing these concerns is for organizations to measure, analyze and document their carbon emissions and climate risks.  Sustainability and climate reporting is effectively being mandated by both governments and industry.    This reporting, however, often requires significant investments in time and resources to design, develop and implement.  The right data must be found, prepared and organized to support for instance, calculation of the Greenhouse Gas emissions. This can be a significant challenge. Open source projects, such as Egeria, can play a significant role in accelerating these initiatives.  In this talk you will learn about:  

  • What are some of the key standards and programs for measuring and reporting on Sustainability  
  •  Acceleration through Open Source projects 
  •  An overview of Egeria and it can jumpstart sustainability projects 
  •  A deeper look at the Greenhouse Gas Protocol and Carbon Accounting showing how Egeria can help organizations establish and deliver trusted sustainability reports

Open Lineage for Data Trust and Understanding

One of the most requested metadata use cases is lineage. This is the ability to understand the origin of your data and the processing (reformatting, enrichment, merging, …) it has gone through between the data’s origin and your AI model. Lineage helps to build trust in your model since it shows you have used appropriate data. Many individual technologies provide some lineage support that covers its own processing. Some data catalogs provide proprietary ways to gather lineage from many sources. However this is expensive to implement and only makes the lineage information available through the data catalog. Now three open source projects from LF AI and Data have come together to create a truely open ecosystem for lineage. Egeria provides open metadata that describes the data sources, data structures, data profiling results and the data pipelines. OpenLinege provides the event mechanism that records each time a data pipeline runs. Marquez provides visualization for lineage. In this talk you will learn about: 

  • What is lineage and how it is used 
  •  What makes lineage difficult to collect and maintain 
  •  How the open ecosystem for lineage works 
  •  How you can use lineage in your data science tools (using Jupyter Notebooks as an example)



  • Using Open Source to Accelerate Sustainability Initiatives
  • Open Lineage for Data Trust and Understanding