DS0101EN: Introduction to Data Science course by edX

Hey peeps! Here is a course for all the engineering students out there who want to pave their way towards being a data scientist. So, gear up and tighten the seat belts. Let’s get into the world of data science.

Top 10 Careers in Data Science that are Shaping the Future

In this course you will:

  • Meet people who work in data science
  • Explore definitions of data science
  • Learn about data science in a business context
  • Discover some use cases and applications of data science

Syllabus

Module 1 – Defining Data Science

  • What is Data Science?
    • Fundamentals of Data Science
    • The Many Paths to Data Science
    • Advice for New Data Scientists

Module 2 – What Data Scientists Do

  • A Day in the Life of a Data Scientist
    • Old problems, new problems, Data Science solutions
    • Data Science Topics and Algorithms
    • Cloud for Data Science

Module 3 – Data Science in Business

  • Foundations of Big Data
    • How Big Data is Driving Digital Transformation
    • What is Hadoop?
    • Data Science Skills and Big Data
    • Data Scientists at New York University

Module 4 – Use Cases for Data Science

  • What is the Difference?
    • Neural Networks and Deep Learning
    • Applications of Machine Learning

Exercise – Computer Vision with IBM Watson

Module 5 – Data Science in Business

  • How Data Science is Saving Lives
    • How Companies Should Get Started in Data Science
    • Applications of Data Science

Module 6 – Careers and Recruiting in Data Science

  • How Can Someone Become a Data Scientist
    • Recruiting for Data Science
    • Careers in Data Science
    • High School Students and Data Science Careers

MODULE 1:

Learning Objectives

In this module you will:

  • Hear from data science professionals to learn what data science is.
  • Learn about the many paths to data science.
  • Hear from data science professionals as they give advice to anyone who is passionate about data science.
  • Learn some statistics about the field of data science, the demand for data scientists, and some of the qualities of excelling data scientists.
  • Learn why data science is the sexiest job of the 21st century.

SUMMARY:

In this module, you have learned:

  • Data science is the study of large quantities of data, which can reveal insights that help organizations make strategic choices.
  • There are many paths to a career in data science; most, but not all, involve a little math, a little science, and a lot of curiosity about data.
  • New data scientists need to be curious, judgemental and argumentative.
  • Why data science is considered the sexiest job in the 21st century, paying high salaries for skilled workers.

MODULE 2:

Learning Objectives

In this module you will:

  • Hear from data scientists as they share with you what a typical day in the life of a data scientist looks like.
  • Hear from data scientists as they share with you what tools, algorithms, and technologies they use on a daily basis.
  • Hear from data scientists as they try to explain the term “cloud”.
  • Learn about data science, data scientists, and how each is defined.

In this module, you have learned:

  • The typical workday for a Data Scientist varies depending on what type of project they are working on.
  • Many algorithms are used to bring out insights from data. 
  • Accessing algorithms, tools, and data through the Cloud enables Data Scientists to stay up-to-date and collaborate easily.

MODULE 3:

In this module, you have learned:

  • How Big Data is defined by the Vs: Velocity, Volume, Variety, Veracity, and Value.
  • How Hadoop and other tools, combined with distributed computing power,  are used to handle the demands of Big Data.
  • What skills are required to analyse Big Data. 
  • About the process of Data Mining, and how it produces results.

MODULE 4:

Learning Objectives

In this module you will:

  • Hear from Norman White, the Faculty Director of the Stern Centre for Research Computing, at New York University.
  • Hear from Norman White as he talks about data science and what skills are required for anyone interested in pursuing a career in this field.
  • Hear from Norman White as he explains some of the popular data science tools and algorithms.
  • Hear from Norman White as he gives advice to high schools students, in particular, and anyone, in general, who are looking to start a career in data science.
  • Learn about data mining, and the steps the comprise the process of mining a given dataset.
  • Learn about regression and what questions can be put to regression analysis.

SUMMARY:

In this module, you have learned:

  • The differences between some common Data Science terms, including Deep Learning and Machine Learning.
  • Deep Learning is a type of Machine Learning that simulates human decision-making using neural networks.
  • Machine Learning has many applications, from recommender systems that provide relevant choices for customers on commercial websites, to detailed analysis of financial markets.
  • How to use regression to analyze data.

MODULE 5:

Learning Objectives

In this module you will:

  • Learn about what companies need to do in order to start with data science.
  • Learn about some of the qualities that differentiate data scientists from other professionals.
  • Learn about some applications of data science.
  • Learn about analytics and what important role data scientists play in this process.
  • Learn about story-telling and the importance of an effective final deliverable.
  • Learn about the main components of an effective final deliverable.
  • Apply what you learned about data science to answer open-ended questions.
  • Demonstrate your understanding of the readings to define what data science and data scientist mean.
  • Demonstrate your understanding of the readings to answer a question about the final deliverable of data science project.

Summary:

In this module, you have learned:

  • Data Science helps physicians provide the best treatment for their patients, and helps meteorologists predict the extent of local weather events, and can even help predict natural disasters like earthquakes and tornadoes.
  • That companies can start on their data science journey by capturing data. Once they have data, they can begin analysing it.
  • Some ways that data is generated by consumers. 
  • How businesses like Netflix, Amazon, UPs, Google, and Apple use the data generated by their consumers and employees.
  • The purpose of the final deliverable of a Data Science project is to communicate new information and insights from the data analysis to key decision-makers.

MODULE 6:

Learning Objectives

In this module you will:

  • Learn about what companies need to do in order to start with data science.
  • Learn about some of the qualities that differentiate data scientists from other professionals.
  • Learn about some applications of data science.
  • Learn about analytics and what important role data scientists play in this process.
  • Learn about story-telling and the importance of an effective final deliverable.
  • Learn about the main components of an effective final deliverable.
  • Apply what you learned about data science to answer open-ended questions.
  • Demonstrate your understanding of the readings to define what data science and data scientist mean.
  • Demonstrate your understanding of the readings to answer a question about the final deliverable of data science project.

SUMMARY:

In this module, you have learned:

  • The length and content of the final report will vary depending on the needs of the project.
  • The structure of the final report for a Data Science project should include a cover page, table of contents, executive summary, detailed contents, acknowledgments, references, and appendices.
  • The report should present a thorough analysis of the data and communicate the project findings.

Data Management

Data management is the process of absorbing, storing, organising, and managing an organization’s data. Effective data management is a critical component of implementing IT systems that operate business applications and offer analytical information to enable corporate executives, business managers, and other end users to drive operational decision-making and strategic planning. The data management process ensures that data in business systems is correct, available, and accessible. The majority of the needed work is performed by IT and data management teams, while business users are involved in some aspects of the process. This thorough reference explains what it is and offers information on the various fields it encompasses.

History and Evolution

The early blooming of data management was primarily driven by IT experts who concentrated on tackling the problem of garbage in, garbage out in the earliest computers after discovering that the machines drew incorrect conclusions due to erroneous or inadequate data. Beginning in the 1960s, industry groups and professional organisations pushed optimal data management practises, particularly in terms of professional training and data quality criteria. That decade also saw the introduction of mainframe-based hierarchical databases. 

The data warehouse concept was created in the late 1980s, and early adopters started using them in the mid-1990s. Relational software was the dominating technology in the early 2000s, with a virtual monopoly on database deployments. Organizations now have a wider range of data management options because of the emergence of big data and NoSQL alternatives. 

Benefits of data management

By increasing operational performance and allowing improved decision-making, a well-executed data management strategy may help firms acquire potential competitive advantages over their business rivals. Organizations with well-managed data may also become more flexible, allowing them to more rapidly detect market trends and seize new business possibilities.

Effective data management may also assist businesses in avoiding data breaches, data privacy issues, and regulatory compliance difficulties that might harm their brand, add unanticipated expenses, and put them in legal trouble. Finally, the most significant benefit that a sound data management strategy can give is improved company performance.

Importance of data management

Data is increasingly being viewed as a corporate asset that can be utilised to make better business choices, enhance marketing efforts, streamline operations, and save expenses, all with the objective of boosting revenue and profits. However, a lack of appropriate data management may leave businesses with incompatible data silos, inconsistent data sets, and data quality issues, limiting their capacity to operate business intelligence (BI) and analytics applications — or, worse, leading to erroneous conclusions. As organisations are subjected to a growing number of regulatory compliance obligations, including data privacy and protection legislation such as GDPR and the California Consumer Privacy Act, data management has become increasingly important. Furthermore, organisations are gathering ever-increasing amounts of data and a broader range of data kinds, both of which are trademarks of the big data platforms that many have implemented. Without proper data management, such settings may become cumbersome and difficult to traverse.

Tasks and duties in data management

The data management process necessitates a wide range of activities, responsibilities, and abilities. Individual workers in tiny firms with less resources may take on several responsibilities. Data management professionals, in general, include data architects, data modellers, database administrators (DBAs), database developers, data quality analysts and engineers, data integration developers, data governance managers, data stewards, and data engineers, who collaborate with analytics teams to build data pipelines and prepare data for analysis. 

Data scientists and other data analysts may also undertake certain data management activities on their own, particularly in large data systems containing raw data that must be filtered and processed for specific applications. Similarly, application developers frequently assist in the deployment and management of big data environments, which necessitate the acquisition of new skills in comparison to relational database systems. As a result, businesses may need to acquire new employees or retrain established DBAs in order to fulfil their big data management requirements.

Globalization!

Globalization is the word used to describe the growing interdependence of the world’s economies, cultures, and populations, brought about by cross-border trade in goods and services, technology, and flows of investment, people, and information.

Globalization is driven by the convergence of political, cultural and economic systems that ultimately promote — and often necessitate — increased interaction, integration and dependency amongst nations.

The more that disparate regions of the world become intertwined politically, culturally and economically, the more globalized the world becomes.

These international interactions and dependencies are enabled and accelerated by advances in technology, especially in transportation and telecommunications. In general, money, technology, materials and even people flow more swiftly across national boundaries today than they ever have in the past. The flow of knowledge, ideas and cultures is expediated through Internet communications.

There are three types of globalization:

1. Economic globalization. This type focuses on the unification and integration of international financial markets, as well as multinational corporations that have a significant influence on international markets.
2. Political globalization. This type deals mainly with policies designed to facilitate international trade and commerce. It also deals with the institutions that implement these policies, which can include national governments as well as international institutions, such as the International Monetary Fund and the World Trade Organization.
3. Cultural globalization. This type focuses on the social factors that cause cultures to converge — such as increased ease of communication and transportation, brought about by technology.

Computers in detail

Definition

A computer is a machine that performs tasks and calculations according to a series of instructions or program operations (circuits, etc.) and software (OS, the underlying software that controls the hardware when the user issues instructions).

Programming

You don’t need a car, its function is the same as a jukebox that keeps playing tracks. For example, they want to tell the music box to play different music every time. They want to program the music box so that it can play different music. This part of the history of computers is called “the history of programmable machines.” “This is a concise sentence in the history of machines. When I speak their language, I can order to do different things.”

The role of computers in daily life

Today’s computers are fast, small and small. Powerful. Computers can save money, time and labor; otherwise it would take months or years to complete in a few seconds with computers. Launch satellites using a simple application on our desktop. According to the requirements, the storage capacity and speed of the computer will be different. For example, NASA and other companies that use high-speed supercomputers, because speed is an important part of your business. Computers are cheaper than smart phones. The essence is in hand. At home, computers provide opportunities to access social networks, read books, or work from home.In the office, they are the most important because they are the most important form of work. Although computers can help with almost everything from shopping to work to taking notes or playing games, people rely heavily on computers for almost everything. Computers make life easier and faster than ever. Large storage rooms are of great help to today’s business. The data received from different systems is stored on the computer for later use. In addition, previous shopping, reading, work, arithmetic, calling appointments and events or activating alarms all occurred in different places. Now everything can be done with our smart phone. If the book is bulky, it would be great to bring ten books. Now; millions of books are within reach. There is no need to check calendars anymore, because they have been replaced by reminders that automatically remind us of important events and clocks that only need to be activated once. These are some basic tasks that are performed daily, and it is difficult to work without a computer. Computers are no longer a luxury, they can be used in different forms on different platforms, for example as smart phones. Computers are also very helpful in medicine and have undergone extensive development in the past decade. Addictive.All in all, we can say that computers have profoundly affected our lives, and changes are no longer associated with ease, habit, and popularity.

The above is a small example of computers, but as far as modern technology is concerned, computers have changed our way of life. Computers are not only used for arithmetic but are now active in teaching, industrial purposes, automated processes, data management, analysis, personal and group entertainment, Music synthesis, professional photo and video editing, hardcore games, research goals, creating new software and applications to make life easier, shopping, banking, marketing, and even participating in the artificial creation of intelligent creatures with deep learning concepts and machine learning.

The Basics Of Database Management System

Data processing has undergone evolutionary changes in the past 30 years. Processing with a database management system offers a number of advantages. Presents the basics of today′s dynamic database management systems. Reviews the relevant professional magazines and concludes that systems now are more user‐friendly.
A collection of interrelated data together with a set of programs to access the data, also called database system, or simply database. The primary goal of such a system is to provide an environment that is both convenient and efficient to use in retrieving and storing information.
A database management system (DBMS) is designed to manage a large body of information. Data management involves both defining structures for storing information and providing mechanisms for manipulating the information. In addition, the database system must provide for the safety of the stored information, despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible anomalous results due to multiple users concurrently accessing the same data.
Examples of the use of database systems include airline reservation systems, company payroll and employee information systems, banking systems, credit card processing systems, and sales and order tracking systems.

A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data are stored and maintained. Thereby, data can be stored in complex data structures that permit efficient retrieval, yet users see a simplified and easy-to-use view of the data. The lowest level of abstraction, the physical level, describes how the data are actually stored and details the data structures. The next-higher level of abstraction, the logical level, describes what data are stored, and what relationships exist among those data. The highest level of abstraction, the view level, describes parts of the database that are relevant to each user; application programs used to access a database form part of the view level.
The overall structure of the database is called the database schema. The schema specifies data, data relationships, data semantics, and consistency constraints on the data.
Underlying the structure of a database is the logical data model: a collection of conceptual tools for describing the schema.
The entity-relationship data model is based on a collection of basic objects, called entities, and of relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. For example, each person is an entity, and bank accounts can be considered entities. Entities are described in a database by a set of attributes. For example, the attributes account-number and balance describe one particular account in a bank. A relationship is an association among several entities. For example, a depositor relationship associates a customer with each of her accounts. The set of all entities of the same type and the set of all relationships of the same type are termed an entity set and a relationship set, respectively .

The information in a database is stored on a nonvolatile medium that can accommodate large amounts of data; the most commonly used such media are magnetic disks. Magnetic disks can store significantly larger amounts of data than main memory, at much lower costs per unit of data.
To improve reliability in mission-critical systems, disks can be organized into structures generically called redundant arrays of independent disks (RAID). In a RAID system, data are organized with some amount of redundancy (such as replication) across several disks. Even if one of the disks in the RAID system were to be damaged and lose data, the lost data can be reconstructed from the other disks in the RAID system.
Data manipulation is the retrieval, insertion, deletion, and modification of information stored in the database. A data-manipulation language enables users to access or manipulate data as organized by the appropriate data model. There are basically two types of data-manipulation languages: Procedural data-manipulation languages require a user to specify what data are needed and how to get those data; nonprocedural data-manipulation languages require a user to specify what data are needed without specifying how to get those data.
A query is a statement requesting the retrieval of information. The portion of a data-manipulation language that involves information retrieval is called a query language. Although technically incorrect, it is common practice to use the terms query language and data-manipulation language synonymously.
Database languages support both data-definition and data-manipulation functions. Although many database languages have been proposed and implemented, SQL has become a standard language supported by most relational database systems. Databases based on the object-oriented model also support declarative query languages that are similar to SQL.
SQL provides a complete data-definition language, including the ability to create relations with specified attribute types, and the ability to define integrity constraints on the data.
Data Security:
The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or a subset of it known as a “subschema.” For example, in an employee database, some users may be able to view salaries while others may view only work history and medical data. See database security.
Data Integrity:
The DBMS can ensure that no more than one user can update the same record at the same time. It can keep duplicate records out of the database; for example, no two customers with the same customer number can be entered.
Intelligent Databases:
All DBMSs provide some data validation; for example, they can reject invalid dates or alphabetic data entered into money fields. But most validation is left up to the application programs.
Intelligent databases provide more validation; for example, table lookups can reject bad spelling or coding of items. Common algorithms can also be used such as one that computes sales tax for an order based on zip code.
When validation is left up to each application program, one program could allow an item to be entered while another program rejects it. Data integrity is better served when data validation is done in only one place. Mainframe DBMSs were the first to become intelligent, and all the others followed suit.

Why is SQL important? What problem is it solving?

Database administration or data management is incomplete without the SQL. For comfortable use of  SQL as part of your administration or development requires that you understand the basics of SQL, which will take you a long way in your career.

Before we start with the introduction to the SQL itself, and then, understand the important features of SQL Server. It will take you through a demonstration of the internal workings of SQL, starting from SQL standards, evolution, and progresses to creating tables, understanding and defining relationships, writing Transact‑SQL commands, and so on.

You will also understand that SQL is a special-purpose programming language; special-purpose, as in, it is different from the general-purpose programming languages such as C, C++, Java/JavaScript, etc., meaning, it has a very particular purpose: manipulation of datasets. And this manipulation happens using what is known as Relational Calculus.

But isn’t studying SQL alone, restrictive? Turns out, it isn’t. Of course, we can use SQL on any kind of database or data source, but even if we cannot directly use SQL, most query languages of today have some relationship to SQL. In general, once you know SQL, you can effortlessly pick up other query languages too.

Standards are vital because every relational database must build its framework around this framework in order to ensure compatibility. This means that the learning curve is greatly reduced. SQL is ANSI as well as ISO-compliant, along with other standards, which emphasizes the fact that you have to learn the concept only once.