Big Data at the National Science Foundation
The National Science Foundation (NSF) is a leader in supporting Big Data research efforts. These efforts are part of a larger portfolio of Data Science activities. NSF initiatives in Big Data and Data Science encompass Research, Cyberinfrastructure, Education and Training, and Community Building.
Big Data Fundamental Research
NSF research programs in Big Data cover algorithmic, statistical, and mathematical foundations of data science; new techniques, technologies, and methodologies, including hardware and software approaches; and innovative uses of data for scientific discovery and action.
Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA)
The BIGDATA program seeks novel approaches in computer science, statistics, computational science, and mathematics, along with innovative applications in domain science.
The BIGDATA program seeks novel approaches in computer science, statistics, computational science, and mathematics leading towards the further development of the interdisciplinary field of data science. The program also seeks innovative applications in domain science, including social and behavioral sciences, education, physical sciences, and engineering, where data science and the availability of big data are creating new opportunities for research and insights not previously possible. The solicitation invites two categories of proposals:
Foundations (BIGDATA: F): those developing or studying fundamental theories, techniques, methodologies, and technologies of broad applicability to big data problems, motivated by specific data challenges and requirements; and
Innovative Applications (BIGDATA: IA): those engaged in translational activities that employ new big data techniques, methodologies, and technologies to address and solve problems in specific application domains. Projects in this category must be collaborative, involving researchers from domain disciplines and one or more methodological disciplines, e.g., computer science, statistics, mathematics, simulation and modeling, etc.
Proposals are expected to be well motivated by specific big data problems in one or more science and engineering research domains. All proposals are expected to clearly articulate the big data aspect(s) that motivate the research. Innovative Applications proposals must provide clear examples of the impacts of the big data techniques, technologies and methodologies on applications in one or more domains.
In FY 2018, the BIGDATA program continues the cloud option that was introduced in FY 2017, in partnership with Amazon Web Services (AWS), Google Cloud, and Microsoft Azure (see Use of Cloud Resources, at the end of Section II, Program Description).
Before preparing a proposal in response to this BIGDATA solicitation, applicants are strongly urged to review other related programs and solicitations and contact the respective NSF program officers to identify whether those solicitations are more appropriate. In particular:
Proposals that focus exclusively on areas of biology supported by NSF’s Directorate for Biological Sciences (BIO) should be submitted to programs such as Advances in Biological Informatics that are managed by the BIO Division of Biological Infrastructure (DBI; https://www.nsf.gov/div/index.jsp?div=DBI);
Proposals specific to geosciences that respond to the community needs and requirements expressed by the geosciences community should consider the EarthCube program for Developing a Community-Driven Data and Knowledge Environment for the Geosciences (https://www.nsf.gov/geo/earthcube/);
For the development of robust and shared data- or software-centric cyberinfrastructure capabilities, applicants should consider the Cyberinfrastructure for Sustained Scientific Innovation - Data and Software program(CSSI; https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=505505);
For computational and data science research not specifically addressing big data issues, applicants should consider the Computational and Data Enabled Science and Engineering program (CDS&E; http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504813);
For work that is focused more on scaling performance of software rather than data-related issues, applicants should consider the Scalable Parallelism in the Extreme program (SPX; https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=505348);
Proposals that focus on research in mathematics or statistics that is not tied to a specific big data problem should be submitted to the appropriate program within NSF’s Directorate for Mathematical & Physical Sciences (MPS) Division of Mathematical Sciences (DMS); see a list of DMS programs at https://www.nsf.gov/funding/programs.jsp?org=DMS; and
Proposals that focus on research relevant to NSF’s Directorate for Computer and Information Science and Engineering (CISE) not tied to a specific big data problem should be submitted to the appropriate CISE program, including the core programs:
Computer and Network Systems (CNS) Core Programs: https://www.nsf.gov/pubs/2017/nsf17570/nsf17570.htm;
Computing and Communication Foundations (CCF) Core Programs: https://www.nsf.gov/pubs/2017/nsf17571/nsf17571.htm; and
Information and Intelligent Systems (IIS) Core Programs: https://www.nsf.gov/pubs/2017/nsf17572/nsf17572.htm.
Computational and Data–Enabled Science and Engineering (CDS&E)
The goal of the CDS&E program is to identify and capitalize on opportunities for major scientific and engineering breakthroughs through new computational and data analysis approaches.
Advanced computational infrastructure and the ability to perform large-scale simulations and accumulate massive amounts of data have revolutionized scientific and engineering disciplines. The goal of the CDS&E program is to identify and capitalize on opportunities for major scientific and engineering breakthroughs through new computational and data analysis approaches. The intellectual drivers may be in an individual discipline or they may cut across more than one discipline in various Directorates. The key identifying factor is that the outcome relies on the development, adaptation, and utilization of one or more of the capabilities offered by advancement of both research and infrastructure in computation and data, either through cross-cutting or disciplinary programs.
The CDS&E program welcomes proposals in any area of research supported through the participating divisions that address at least one of the following criteria:
· Promote the creation, development, and application of the next generation of mathematical, computational and statistical theories and tools that are essential for addressing the challenges presented to the scientific and engineering communities by the ever-expanding role of computational modeling and simulation and the explosion and production of digital experimental and observational data.
· Promote and encourage integrated research projects that create, develop and apply novel computational, mathematical and statistical methods, algorithms, software, data curation, analysis, visualization and mining tools to address major, heretofore intractable questions in core science and engineering disciplines, including large-scale simulations and analysis of large and heterogeneous collections of data.
· Encourage adventurous ideas that generate new paradigms and that create and apply novel techniques, generating and utilizing digital data in innovative ways to complement or dramatically enhance traditional computational, experimental, observational, and theoretical tools for scientific discovery and application.
· Encourage ideas at the interface between scientific frameworks, computing capability, measurements and physical systems that enable advances well beyond the expected natural progression of individual activities, including development of science-driven algorithms to address pivotal problems in science and engineering and efficient methods to access, mine, and utilize large data sets.
The CDS&E program is not intended to replace existing programs that make awards that involve computation and the analysis of large data sets. Rather, the CDS&E program is meant to fund awards that have a significant component of cyber development or cyber science that goes well beyond what would normally be included in these programs. PIs should ask for consideration and review as a CDS&E proposal only if the proposal addresses at least one of these additional cyber components. Any proposal submitted to the CDS&E program that does not satisfy at least one of the additional criteria listed above will be reviewed within the context of the individual program. A proposal that is requesting consideration within the context of CDS&E should begin the title with the identifying acronym "CDS&E:"
Supplement requests to existing awards within a program that address one of the points above will also be considered.
Directorate for Mathematical and Physical Sciences: The CDS&E program in MPS explicitly addresses the distinct intellectual and technological discipline lying at the intersection of applied mathematics, statistics, computer science, and the core science disciplines of astronomy, chemistry, physics, mathematics, and materials research. Proposals are expected to be relevant to mathematical and physical sciences.
Astronomy (AST): CDS&E encompasses those areas of inquiry where significant progress is critically dependent upon the application of new computational hardware, software, or algorithms, or upon the use of massive data sets. CDS&E encompasses fundamentally new approaches to large-scale simulation and to the analysis of large and heterogeneous collections of data, as well as research into the nature of algorithms and techniques that can be both enabled by data and enable more data-intensive research.
Chemistry (CHE): CDS&E encourages innovative approaches for new paradigms in algorithms, software design, and data techniques that impact chemistry research. Potential areas of focus include computational and data tool development for modeling, simulation, data processing and analysis, and data-driven chemical discovery. Successful submissions will produce new approaches to gaining fundamental chemical knowledge and understanding.
Materials Research (DMR): CDS&E includes projects that involve: the creative use of computation or high performance computing, particularly in conjunction with data-centric methods and data from simulation or experiment to advance fundamental challenging problems of materials research including the discovery or design of materials and materials systems with desired properties, the discovery or control of materials-related phenomena, or new states of matter; the creation, development, and application of computational and data-centric tools across fundamental materials research to advance fundamental understanding; the creation and application of novel techniques that utilize digital data from experiment, simulation, or both in innovative ways to complement or dramatically enhance traditional computational, experimental, and theoretical methods to discover new materials, new materials-related phenomena, or advance fundamental understanding of materials. Broader impacts may include a focus on infusing computation and data-centric approaches and the use of advanced cyberinfrastructure of the materials research community into education in materials and materials-related disciplines.
Mathematical Sciences (DMS): CDS&E includes the creation, development, and application of the next generation of mathematical and statistical theories and tools that will be essential for addressing the challenges presented to the scientific and engineering communities by the ever expanding role of computational modeling and simulation on the one hand, and the explosion and production of digital and observational data on the other.
Physics (PHY): CDS&E includes ideas at the interface between scientific frameworks and computing capability that enable advances well beyond the expected natural progress of either activity. This includes development of science-driven algorithms and numerical models to address pivotal problems in physics and efficient methods to access and mine large data sets.
Directorate for Engineering: The CDS&E program in engineering recognizes the importance of engineering in CDS&E and vice-versa. Many natural and built engineering processes, devices and/or systems require high fidelity simulations over disparate scales that can be interrogated, analyzed, modeled, optimized or controlled, and even integrated with experiments or physical facilities. This program accepts proposals that confront and embrace the host of research challenges presented to the science and engineering communities by the ever-expanding role of computational modeling and simulation on the one hand, and experimental and/or observational data on the other. The goal of the program is to promote the creation, development, and utilization of the next generation of theories, algorithms, methods, tools, and cyberinfrastructure in science and engineering applications.
Successful research supported by CDS&E in engineering will encompass all engineering and related disciplines that are potentially transformative and multidisciplinary and that address computational and/or data challenges. Proposals submitted to this program should draw on productive intellectual partnerships that synergistically capitalize upon knowledge and expertise in multiple fields or sub-fields in science or engineering and/or in multiple types of organizations. Proposals submitted to this program announcement should address the relevance of the proposed project to engineering.
Chemical, Bioengineering, Environmental and Transport (CBET): CDS&E in CBET includes the use of high performance and emerging computational tools and environments – beyond that supported by core programs – in advancing mathematical modeling, simulation and analysis to describe and analyze with greater fidelity, complexity and scale, engineering processes in chemical, biochemical and biotechnology systems, bioengineering and living systems, sustainable energy and environmental systems, and transport and thermal-fluids systems. Topics of special interest: 1) Advanced modeling and analysis for water resources, earth systems, built environments, sustainable manufacturing, energy systems, food systems, and regional, national and/or global material flows, 2) Innovative modeling methodologies for turbulent flows and for flows of complex fluids and suspensions, 3) Developing advanced modeling capabilities for thermal fluids and combustion, 4) Extending validated molecular and/or macro-molecular models to the prediction of applications-level engineering problems, 5) Molecular modeling approaches for protein-protein interactions in native states, estimating rates of fundamental biomolecular reactions, biomolecular recognition phenomena and others, 6) Developing modeling strategies to simulate protein folding and native state protein-protein interactions, and to analyze multi-level regulatory metabolic structures containing spatio-temporal variability, and 7) Molecular and multiscale modeling, model-predictive control and optimization of complex chemical processes.
Civil, Mechanical and Manufacturing Innovation (CMMI): CDS&E supports the development of novel computational and data tools and environments–beyond that supported by core programs– that build on existing cyberinfrastructure and enable major advances in CMMI communities. Topics of special interest: 1) Expanding and leveraging knowledge of biological processes and systems, including biomimicry or bio-emulation, for the design of smart materials, systems and infrastructure. (For the purposes of this program, living tissues are considered to be self-designing smart materials). 2) Conducting multi-scale, multi-temporal and/or multi-physics-based modeling and design of materials systems and structures, including data-informed materials discovery through machine learning, visualization, or other computational or data science approaches 3) Exploring and enhancing understanding of human cognitive, behavioral and social processes within engineered systems and 4) Integration of modeling, simulation, visualization and computational data analysis to enhance understanding of complex dynamical systems.
Electrical, Communications and Cyber Systems (ECCS): CDS&E in ECCS includes the development of innovative computational algorithms and the application of high performance and emerging computational tools for high-fidelity modeling and simulations of electronic, photonic and electromagnetic devices, components and systems in order to advance the frontiers in electronics, communications and sensing.
Office of Advanced Cyberinfrastructure (OAC): CDS&E in OAC addresses research in cyberinfrastructure itself with the clear potential to impact multiple research disciplines through the development of the paradigms, algorithms and processes needed to provide general CDS&E solutions as part of comprehensive, integrated, sustainable and secure cyberinfrastructure.
Source: www.nsf.gov/cise/bigdata/