The core knowledge base and skill sets for individual data roles could be divided, depending on the actual responsibilities a data science role may take. Data science roles may require or expect to have more comprehensive knowledge structure and capability set, although their actual responsibilities may be much more specific.
For example, a data scientist may focus on predictive modeling, however, he/she would usually be required to deeply understand and can also undertake descriptive, diagnostic, predictive and prescriptive levels of analytics and model development.
Figure 1 illustrates the core categories of comprehensive knowledge and skill that may be expected for data innovation and services both in a data science team and for a senior data science role. They are data science thinking, theoretical foundation, technical skills, work practice, communication, and management.
Data science thinking, to support the data science objective of ?think with data?, which requires thinking traits and habits as well as corresponding knowledge and skills to enable creative thinking, critical thinking, especially cognitive thinking, imaginary thinking, inferential thinking, reduction, abstraction, and summarization, as well as research methods and decision sciences. Refer to Section 3 for more discussions about data science thinking, in particular, creative thinking, critical thinking in data science.
Theoretical foundation, consisting of knowledge of relevant theories in disciplines and areas that include statistics, mathematics, understanding data characteristics, data representation and modeling, similarity and metric learning, algorithms and models, qualitative analysis, quantitative analysis, computing/computational science, complexity analysis, evaluation methods and enhancement, and metaanalysis and meta-synthesis.
Technical skills, composed of skills and techniques for data preparation, data exploration, data mining, document analysis, machine learning, pattern recognition, information retrieval, data management, data engineering, analytics programming, high performance computing, networking and communication, operations research, human-machine interaction, visualization and graphics, software engineering, and system analysis and design.
Practices, including practical components to conduct real-life data infrastructure and architecture building; data management, retrieval, storage, and backup; data manipulation and processing, various levels of business-oriented analytics, and optimization; experimental design, simulation, and evaluation; project development, and management; case studies, applications, prototyping and deployment.
Communication, consisting of presentation, story-telling, reporting, documentation, group collaboration, teamwork, seminars and workshops, reflection, and refinement to different stakeholders, such as end users, business operators, project owners, and decision-makers.
Management of governance, organization, projects, resources, data quality, roles, responsibilities, risk, impact, privacy, security, social and professional issues, and deployment and decisions.
Leadership, playing the roles as senior executives to make decisions on strategic questions like: What competitive and business values, benefits and impact can be from undertaking a data science project? What essential elements are required in order to effectively and efficiently undertake data science projects? How should a data science team be set up? Where the potential risk is and how that could be mitigated? What better data science practice could be? Where are the gaps for staff further work and career development in data science? These involve high level decision science, strategic planning, business valuation, best practice, and optimization.
Note: Excerpted from " Longbing Cao. Understanding Data Science, Springer, 2018."