DS0101EN: Introduction to Data Science course by edX

Hey peeps! Here is a course for all the engineering students out there who want to pave their way towards being a data scientist. So, gear up and tighten the seat belts. Let’s get into the world of data science.

Top 10 Careers in Data Science that are Shaping the Future

In this course you will:

  • Meet people who work in data science
  • Explore definitions of data science
  • Learn about data science in a business context
  • Discover some use cases and applications of data science

Syllabus

Module 1 – Defining Data Science

  • What is Data Science?
    • Fundamentals of Data Science
    • The Many Paths to Data Science
    • Advice for New Data Scientists

Module 2 – What Data Scientists Do

  • A Day in the Life of a Data Scientist
    • Old problems, new problems, Data Science solutions
    • Data Science Topics and Algorithms
    • Cloud for Data Science

Module 3 – Data Science in Business

  • Foundations of Big Data
    • How Big Data is Driving Digital Transformation
    • What is Hadoop?
    • Data Science Skills and Big Data
    • Data Scientists at New York University

Module 4 – Use Cases for Data Science

  • What is the Difference?
    • Neural Networks and Deep Learning
    • Applications of Machine Learning

Exercise – Computer Vision with IBM Watson

Module 5 – Data Science in Business

  • How Data Science is Saving Lives
    • How Companies Should Get Started in Data Science
    • Applications of Data Science

Module 6 – Careers and Recruiting in Data Science

  • How Can Someone Become a Data Scientist
    • Recruiting for Data Science
    • Careers in Data Science
    • High School Students and Data Science Careers

MODULE 1:

Learning Objectives

In this module you will:

  • Hear from data science professionals to learn what data science is.
  • Learn about the many paths to data science.
  • Hear from data science professionals as they give advice to anyone who is passionate about data science.
  • Learn some statistics about the field of data science, the demand for data scientists, and some of the qualities of excelling data scientists.
  • Learn why data science is the sexiest job of the 21st century.

SUMMARY:

In this module, you have learned:

  • Data science is the study of large quantities of data, which can reveal insights that help organizations make strategic choices.
  • There are many paths to a career in data science; most, but not all, involve a little math, a little science, and a lot of curiosity about data.
  • New data scientists need to be curious, judgemental and argumentative.
  • Why data science is considered the sexiest job in the 21st century, paying high salaries for skilled workers.

MODULE 2:

Learning Objectives

In this module you will:

  • Hear from data scientists as they share with you what a typical day in the life of a data scientist looks like.
  • Hear from data scientists as they share with you what tools, algorithms, and technologies they use on a daily basis.
  • Hear from data scientists as they try to explain the term “cloud”.
  • Learn about data science, data scientists, and how each is defined.

In this module, you have learned:

  • The typical workday for a Data Scientist varies depending on what type of project they are working on.
  • Many algorithms are used to bring out insights from data. 
  • Accessing algorithms, tools, and data through the Cloud enables Data Scientists to stay up-to-date and collaborate easily.

MODULE 3:

In this module, you have learned:

  • How Big Data is defined by the Vs: Velocity, Volume, Variety, Veracity, and Value.
  • How Hadoop and other tools, combined with distributed computing power,  are used to handle the demands of Big Data.
  • What skills are required to analyse Big Data. 
  • About the process of Data Mining, and how it produces results.

MODULE 4:

Learning Objectives

In this module you will:

  • Hear from Norman White, the Faculty Director of the Stern Centre for Research Computing, at New York University.
  • Hear from Norman White as he talks about data science and what skills are required for anyone interested in pursuing a career in this field.
  • Hear from Norman White as he explains some of the popular data science tools and algorithms.
  • Hear from Norman White as he gives advice to high schools students, in particular, and anyone, in general, who are looking to start a career in data science.
  • Learn about data mining, and the steps the comprise the process of mining a given dataset.
  • Learn about regression and what questions can be put to regression analysis.

SUMMARY:

In this module, you have learned:

  • The differences between some common Data Science terms, including Deep Learning and Machine Learning.
  • Deep Learning is a type of Machine Learning that simulates human decision-making using neural networks.
  • Machine Learning has many applications, from recommender systems that provide relevant choices for customers on commercial websites, to detailed analysis of financial markets.
  • How to use regression to analyze data.

MODULE 5:

Learning Objectives

In this module you will:

  • Learn about what companies need to do in order to start with data science.
  • Learn about some of the qualities that differentiate data scientists from other professionals.
  • Learn about some applications of data science.
  • Learn about analytics and what important role data scientists play in this process.
  • Learn about story-telling and the importance of an effective final deliverable.
  • Learn about the main components of an effective final deliverable.
  • Apply what you learned about data science to answer open-ended questions.
  • Demonstrate your understanding of the readings to define what data science and data scientist mean.
  • Demonstrate your understanding of the readings to answer a question about the final deliverable of data science project.

Summary:

In this module, you have learned:

  • Data Science helps physicians provide the best treatment for their patients, and helps meteorologists predict the extent of local weather events, and can even help predict natural disasters like earthquakes and tornadoes.
  • That companies can start on their data science journey by capturing data. Once they have data, they can begin analysing it.
  • Some ways that data is generated by consumers. 
  • How businesses like Netflix, Amazon, UPs, Google, and Apple use the data generated by their consumers and employees.
  • The purpose of the final deliverable of a Data Science project is to communicate new information and insights from the data analysis to key decision-makers.

MODULE 6:

Learning Objectives

In this module you will:

  • Learn about what companies need to do in order to start with data science.
  • Learn about some of the qualities that differentiate data scientists from other professionals.
  • Learn about some applications of data science.
  • Learn about analytics and what important role data scientists play in this process.
  • Learn about story-telling and the importance of an effective final deliverable.
  • Learn about the main components of an effective final deliverable.
  • Apply what you learned about data science to answer open-ended questions.
  • Demonstrate your understanding of the readings to define what data science and data scientist mean.
  • Demonstrate your understanding of the readings to answer a question about the final deliverable of data science project.

SUMMARY:

In this module, you have learned:

  • The length and content of the final report will vary depending on the needs of the project.
  • The structure of the final report for a Data Science project should include a cover page, table of contents, executive summary, detailed contents, acknowledgments, references, and appendices.
  • The report should present a thorough analysis of the data and communicate the project findings.

Data Privacy

Data privacy, also known as information privacy, is a subset of data protection that deals with the proper handling of sensitive data, such as personal data but also other confidential data like financial data and intellectual property data, in order to comply with regulatory requirements while maintaining the data’s confidentiality and immutability. When it comes to protecting data from external and internal dangers, as well as defining what digitally stored data may be shared and with whom, security becomes crucial. In practise, data privacy refers to components of the data sharing process, including how and where that data is maintained, as well as the particular rules that apply to those procedures.

Data Sovereignty

Digital data that is subject to the laws of the country in which it is stored is referred to as data sovereignty.

Many nations have enacted new legislation requiring data to be retained inside the country in which the client resides, in response to the growing use of cloud data services and a perceived lack of security. Governments are currently concerned about data sovereignty because they want to prevent data from being stored outside of the original country’s borders. It might be difficult to ensure that data is exclusively stored in the host nation, and it frequently relies on the details supplied in the Service Level Agreement with the Cloud Service Provider.

Data Privacy Importance

  1. Business Asset Management: Data is, without a doubt, a company’s most valuable asset. We live in a data economy, where businesses place a high value on gathering, sharing, and analysing data on their customers and users, particularly from social media. Transparency in how companies obtain consent to preserve personal data, adhere to their privacy rules, and handle the data they gather is critical to establishing confidence with consumers who regard privacy as a basic human right.
  1. Regulatory Compliance: Data management for regulatory compliance is arguably much more critical. Noncompliance with regulatory obligations on how a company collects, stores, and processes personal data might result in hefty penalties. If the company is hacked or ransomware is used, the implications in terms of lost income and consumer confidence may be considerably greater.

Elements of Data Privacy

Data privacy, often known as information privacy, is made up of three components:

  • Individuals have the right to privacy and control over their personal data.
  • Procedures for managing, processing, collecting, and exchanging personal data in an appropriate manner
  • Data protection rules are followed.

Technologies and Practices for Data Protection that Can Assist You in Protecting User Data

There are numerous storage and management choices to select from when it comes to securing your data. You may use solutions to control access, monitor activities, and respond to threats. Some of the most widely utilised practises and technologies are as follows:

  • Data loss prevention (DLP) is a set of techniques and technologies for preventing data from being stolen, lost, or destroyed by accident. Several techniques to defend against and recover from data loss are frequently included in data loss prevention systems.
  • Storage with built-in data protection—modern storage technology has disc clustering and redundancy as standard features. Cloudian’s Hyperstore, for example, offers up to 14 nines of durability, low cost storage of huge amounts of data, and quick access for low RTO/RPO.
  • Firewalls are tools for monitoring and filtering network traffic. Firewalls can be used to guarantee that only authorised users can access or transmit data.
  • Authentication and authorization—controls that aid in the verification of credentials and the proper application of user privileges. These restrictions are generally used in conjunction with role-based access controls as part of an identity and access management (IAM) system (RBAC).
  • Encryption modifies data content using a method that can only be undone with the correct encryption key. Even if your data is taken, encryption protects it against unwanted access by rendering it unreadable.
  • Endpoint protection safeguards your network’s entry points, such as ports, routers, and connected devices. Endpoint security software generally allows you to monitor and filter traffic at the network perimeter as needed.
  • Data erasure reduces responsibility by removing information that is no longer required. This can be done after the data has been processed and evaluated, or it can be done on a regular basis when the data is no longer useful. Many compliance laws, such as GDPR, demand the deletion of unneeded data.