Common Names and terms associated with Statistical Analysis

 Here are some common names and terms associated with statistical analysis:

Descriptive Statistics

  1. Mean – The average of a set of numbers.
  2. Median – The middle value in a list of numbers.
  3. Mode – The most frequently occurring value in a set of numbers.
  4. Range – The difference between the highest and lowest values.
  5. Variance – Measures the dispersion of a set of data points.
  6. Standard Deviation – The square root of the variance, representing the average amount of variability in a set of data.

Inferential Statistics

  1. Population – The entire group that you want to draw conclusions about.
  2. Sample – A subset of the population used to represent the population.
  3. Hypothesis Testing – A method for testing a claim or hypothesis about a parameter in a population.
  4. Confidence Interval – A range of values that is likely to contain the population parameter.
  5. p-value – The probability of observing a test statistic as extreme as, or more extreme than, the value observed under the null hypothesis.
  6. t-test – A statistical test used to compare the means of two groups.
  7. ANOVA (Analysis of Variance) – A statistical method used to compare the means of three or more samples.
  8. Chi-Square Test – A test that measures how expectations compare to actual observed data.

Regression Analysis

  1. Linear Regression – A method to model the relationship between a dependent variable and one or more independent variables.
  2. Multiple Regression – An extension of linear regression that uses multiple independent variables to predict a dependent variable.
  3. Logistic Regression – A regression model used for binary classification.

Correlation

  1. Pearson Correlation – Measures the linear relationship between two continuous variables.
  2. Spearman Rank Correlation – Measures the strength and direction of association between two ranked variables.

Advanced Statistical Methods

  1. Factor Analysis – A method used to identify underlying relationships between variables.
  2. Cluster Analysis – A method used to group similar data points together.
  3. Time Series Analysis – Techniques used to analyze time-ordered data points.

Data Visualization

  1. Histogram – A graphical representation of the distribution of numerical data.
  2. Box Plot – A standardized way of displaying the distribution of data based on a five-number summary.
  3. Scatter Plot – A graph used to display values for typically two variables for a set of data.

Non-parametric Tests

  1. Mann-Whitney U Test – A test used to compare differences between two independent groups.
  2. Wilcoxon Signed-Rank Test – A test used to compare two paired groups.
  3. Kruskal-Wallis Test – An extension of the Mann-Whitney U test for comparing more than two groups.

What is Research Question

A research question is a specific, focused, and concise query that guides the investigation and helps to identify the problem or issue to be addressed. It should be:
1. _Clear_: Easy to understand and interpret.
2. _Specific_: Well-defined and focused.
3. _Measurable_: Can be answered through data collection and analysis.
4. _Relevant_: Aligns with the research objectives and hypothesis.
5. _Feasible_: Can be answered within the scope of the study.
Research questions can be:
1. _Descriptive_: Seeking to describe a phenomenon or situation.
Example: “What are the demographic characteristics of patients with diabetes?”
2. _Comparative_: Comparing two or more groups or conditions.
Example: “Is there a difference in blood pressure between patients with and without hypertension?”
3. _Causal_: Examining the relationship between variables.
Example: “Does regular exercise reduce the risk of heart disease?”
4. _Exploratory_: Exploring a new area or phenomenon.
Example: “What are the experiences of patients with chronic pain?”
Example of a research question:
“What is the effectiveness of a new medication in reducing symptoms of depression in adults?”
A well-crafted research question serves as a guide for the entire research process, helping to ensure that the study stays focused and on track.

Research Methods for Using AI Applications in Public Transport

 Researching the implementation of AI applications in public transport involves a multidisciplinary approach, combining data science, engineering, urban planning, and social sciences. Below are some key research methods to explore the potential and impact of AI in public transportation systems:

1. Data Collection and Analysis

a. Sources of Data:

  • Sensors and IoT Devices: Install sensors on vehicles and infrastructure to collect data on traffic patterns, vehicle health, and passenger flow.
  • GPS and Tracking Systems: Use GPS data to monitor vehicle locations and movements.
  • Ticketing Systems: Analyze data from smart ticketing systems to understand passenger usage and behavior.
  • Surveys and Interviews: Conduct surveys and interviews with passengers and transport operators to gather qualitative data.

b. Data Processing and Cleaning:

  • Use data cleaning techniques to handle missing values, outliers, and inconsistencies.
  • Apply data integration methods to combine data from multiple sources.

c. Data Analysis Techniques:

  • Descriptive Analytics: Summarize the main characteristics of the data.
  • Predictive Analytics: Use machine learning algorithms to predict future trends and potential issues.
  • Prescriptive Analytics: Develop optimization models to suggest actions based on predictive analytics.

2. Machine Learning and AI Modeling

a. Algorithm Selection:

  • Supervised Learning: For tasks like predictive maintenance and demand forecasting.
  • Unsupervised Learning: For clustering passenger data and identifying patterns.
  • Reinforcement Learning: For optimizing traffic management and route planning.

b. Model Training and Validation:

  • Split the data into training and testing sets.
  • Use cross-validation techniques to ensure model robustness.
  • Evaluate model performance using metrics like accuracy, precision, recall, and F1 score.

c. Model Deployment:

  • Develop scalable architectures for real-time data processing.
  • Implement continuous learning systems that update models based on new data.

3. Simulation and Modeling

a. Traffic Simulation:

  • Use traffic simulation software (e.g., SUMO, MATSim) to model the impact of AI-driven traffic management systems.
  • Simulate different scenarios to evaluate the effectiveness of AI interventions.

b. Autonomous Vehicle Testing:

  • Conduct controlled field tests of autonomous buses and trains.
  • Use virtual environments to test AI algorithms in a variety of conditions before real-world deployment.

c. Scenario Analysis:

  • Develop scenarios to understand the impact of AI on different aspects of public transport, such as safety, efficiency, and passenger satisfaction.

4. Human Factors and Usability Studies

a. User Experience (UX) Research:

  • Conduct usability testing of AI-powered ticketing and information systems.
  • Gather feedback from passengers to improve user interfaces and interaction designs.

b. Acceptance and Adoption Studies:

  • Use surveys and focus groups to understand public perception and acceptance of AI technologies in public transport.
  • Analyze the factors that influence the adoption of AI applications among different demographic groups.

c. Accessibility Evaluation:

  • Assess the accessibility of AI applications for passengers with disabilities.
  • Ensure that AI systems are inclusive and cater to the needs of all users.

5. Impact Assessment and Evaluation

a. Economic Analysis:

  • Conduct cost-benefit analysis to evaluate the financial viability of AI applications.
  • Analyze the impact of AI on operational costs, revenue, and economic growth.

b. Environmental Impact:

  • Measure the impact of AI applications on energy consumption and emissions.
  • Evaluate the potential of AI to contribute to sustainable transport solutions.

c. Social Impact:

  • Assess the impact of AI on job roles and employment in the public transport sector.
  • Study the broader social implications, including equity and accessibility issues.

6. Policy and Regulatory Studies

a. Regulatory Framework Analysis:

  • Study existing regulations related to AI and public transport.
  • Propose policy recommendations to facilitate the safe and effective deployment of AI technologies.

b. Ethical Considerations:

  • Investigate the ethical implications of AI applications, such as privacy concerns and data security.
  • Develop guidelines for ethical AI use in public transport.

c. Stakeholder Analysis:

  • Identify and analyze the roles of various stakeholders, including government agencies, transport operators, and passengers.
  • Develop strategies for stakeholder engagement and collaboration.

By employing these research methods, researchers can gain a comprehensive understanding of how AI can be effectively integrated into public transport systems, addressing technical, social, economic, and ethical challenges.

The Role of Natural Language Processing in Assessing Demand for New Infrastructure

 In the dynamic landscape of urban development and planning, the assessment of demand for new infrastructure is pivotal. As cities grow and evolve, the efficient planning of infrastructure—from roads and bridges to public transport and utilities—is crucial for sustainability and quality of life. Enter Natural Language Processing (NLP), a branch of artificial intelligence that has the potential to significantly enhance the methodologies used in infrastructure demand assessment.

Understanding NLP and Its Capabilities

Natural Language Processing involves the interaction between computers and humans through natural language. The goal of NLP is to read, decipher, understand, and make sense of human languages in a manner that is valuable. This technology processes large amounts of natural language data to extract insights and patterns that are not readily apparent to humans.

NLP in Infrastructure Demand Assessment

1. Data Collection and Analysis

  • Social Media and Online Forums: NLP can analyze discussions and sentiments expressed on social media platforms and online forums regarding infrastructure needs. By examining tweets, posts, and comments, NLP tools can gauge public opinion on existing infrastructure and potential demand for new projects.
  • Survey Data: Traditional surveys generate vast amounts of textual data, often in the form of open-ended responses. NLP can automate the analysis of these responses, providing quick and detailed insights into public sentiment and demand.

2. Predictive Analytics

  • Trend Analysis: NLP can identify trends in public opinion and emerging needs by analyzing changes in language and topics over time. This helps in predicting future demands and potential infrastructure challenges.
  • Sentiment Analysis: By assessing the sentiment behind the textual data gathered from various sources, NLP helps in understanding the public’s feelings towards proposed or existing infrastructure projects.

3. Enhancing Public Engagement

  • Feedback Mechanisms: NLP can be used to develop smarter feedback mechanisms where public input on infrastructure projects is gathered and analyzed in real-time. This can significantly improve the responsiveness of planning authorities to public needs.
  • Chatbots and Virtual Assistants: These tools can be deployed to interact with the public, gather data, and provide information on infrastructure projects, enhancing engagement and participation.

Case Studies and Applications

  • Singapore’s Smart Nation Initiative: Utilizing NLP to analyze communications and feedback on urban planning efforts, helping to guide decisions on where to focus infrastructural developments.
  • Transport for London (TfL): Using sentiment analysis to assess public reactions to changes in transport services and infrastructure, aiding in better decision-making and service adjustments.

Challenges and Considerations

While NLP offers substantial benefits, there are challenges in its implementation:

  • Data Privacy and Security: Handling large volumes of personal data requires robust security measures and adherence to privacy laws.
  • Accuracy and Context: NLP algorithms must be finely tuned to accurately interpret the nuances and context of language, which varies widely across different cultures and communities.

The Future of NLP in Urban Planning

As NLP technology advances, its integration into urban planning and infrastructure development is expected to deepen. Future applications could involve more advanced predictive models and real-time public sentiment analysis, leading to more responsive and effective urban infrastructure planning.

NLP presents a transformative potential for urban planning, offering a more nuanced understanding of public needs and expectations. By harnessing this technology, planners and policymakers can improve the efficiency and efficacy of infrastructure development, ultimately leading to smarter, more sustainable cities.

Data Required for Regression Analysis

Regression analysis requires the following data:
1. *Dependent variable* (Outcome or Response variable): The variable being predicted or explained.
2. *Independent variables* (Predictor or Explanatory variables): The variables used to predict the dependent variable.
3. *Sample size*: A sufficient number of observations (data points) to ensure reliable estimates.
4. *Data type*: Quantitative data (numerical or categorical) for both dependent and independent variables.
5. *No missing values*: Complete data for all variables, or appropriately handled missing values.
6. *Normality*: Dependent variable should be normally distributed (or transformed to normality).
7. *Linearity*: Relationship between dependent and independent variables should be linear.
8. *Homoscedasticity*: Constant variance of residuals across all levels of independent variables.
9. *No multicollinearity*: Independent variables should not be highly correlated with each other.
10. *Random sampling*: Data should be collected through random sampling to ensure representativeness.
Note: The specific data requirements may vary depending on the type of regression analysis (e.g., linear, logistic, multiple) and the research question being addressed.

Difference Between Research methods and Research Techniques

Here are the differences between research methods and research techniques in bullet points:
*Research Methods:*
– Overall approach to collecting and analyzing data
– Broad in scope, encompasses entire research design
– Examples: surveys, experiments, case studies, content analysis
– Purpose: to identify patterns, explain phenomena, predict outcomes
– High level of abstraction, provides framework for entire study
– Used throughout the entire research process
*Research Techniques:*
– Specific tools or procedures used to collect and analyze data
– Narrow in scope, focused on specific data collection or analysis tasks
– Examples: questionnaires, interviews, observations, statistical analysis
– Purpose: to collect data, measure variables, test hypotheses
– Low level of abstraction, provides specific tools for data collection and analysis
– Used during data collection and analysis stages
Let me know if you have any further questions!

TOPSIS Research Method

 TOPSIS, which stands for Technique for Order Preference by Similarity to Ideal Solution, is a multi-criteria decision-making method used to identify the best alternative among a set of options. It is widely employed in various fields, including management, engineering, environmental sciences, and finance.

Here’s a description of the TOPSIS method:

  1. Identifying Criteria: First, you need to define the criteria or attributes that are relevant to your decision-making problem. These criteria should be measurable and contribute to evaluating the alternatives.

  2. Normalization: Once the criteria are identified, the next step involves normalizing the decision matrix. Normalization ensures that all criteria are on the same scale and have equal importance. This step is crucial to prevent bias towards any particular criterion.

  3. Weight Assignment: After normalization, weights are assigned to each criterion to reflect their relative importance. The weights are typically determined based on the preferences of decision-makers or through analytical methods such as the Analytic Hierarchy Process (AHP) or the Analytic Network Process (ANP).

  4. Ideal and Anti-Ideal Solutions: In TOPSIS, two reference points are used: the ideal solution and the anti-ideal solution. The ideal solution represents the best performance for each criterion, while the anti-ideal solution represents the worst performance. These solutions are constructed based on maximizing or minimizing each criterion.

  5. Calculating Proximity to Ideal Solution: For each alternative, the distance to the ideal solution and the distance to the anti-ideal solution are calculated. Typically, Euclidean distance or other distance metrics are used for this purpose.

  6. Relative Closeness to Ideal Solution: Once the distances are calculated, a relative closeness to the ideal solution is determined for each alternative. This is done by comparing the distances and calculating a score that indicates how close each alternative is to the ideal solution.

  7. Ranking Alternatives: Finally, the alternatives are ranked based on their relative closeness to the ideal solution. The alternative with the highest closeness score is considered the best option.

TOPSIS provides a systematic and structured approach to decision-making, allowing decision-makers to consider multiple criteria simultaneously and select the most suitable alternative. It is intuitive, easy to implement, and can accommodate both quantitative and qualitative criteria. However, it is essential to ensure that the criteria and their weights accurately reflect the decision-maker’s preferences and objectives for the method to yield meaningful results.

Different Research Methodologies

Research methods can be broadly categorized into two main types: qualitative and quantitative. Each type has its own set of techniques and approaches, and researchers often use a combination of both methods to gain a more comprehensive understanding of a research question.

Here’s an overview of each type:

  1. Quantitative Research Methods:

    • Experimental Research: Involves manipulating one or more independent variables to observe their effect on a dependent variable while controlling for other factors.
    • Survey Research: Uses questionnaires or interviews to gather data from a sample of participants, aiming to generalize findings to a larger population.
    • Observational Research: Involves systematically observing and recording behavior in its natural context without interference. Researchers may use structured or unstructured observations.
  2. Qualitative Research Methods:

    • Interviews: Conducting in-depth, semi-structured or unstructured interviews with participants to gather rich, detailed information about their experiences, opinions, and perspectives.
    • Focus Groups: Involves a group discussion led by a facilitator to explore a specific set of issues. It helps researchers understand group dynamics and diverse perspectives.
    • Case Study Research: In-depth investigation of a single individual, group, event, or situation to gain a holistic understanding and generate detailed, context-specific knowledge.
    • Ethnographic Research: Involves immersing the researcher in the culture or community being studied to gain a deep understanding of social phenomena in their natural context.
  3. Mixed-Methods Research:

    • Combines both quantitative and qualitative research methods within a single study to provide a more comprehensive understanding of the research question.
  4. Action Research:

    • Involves collaboration between researchers and practitioners to solve real-world problems. The process is iterative, with researchers and participants working together to implement and evaluate solutions.
  5. Meta-Analysis:

    • Involves the statistical analysis of existing research findings from multiple studies on a particular topic to draw generalizable conclusions.
  6. Longitudinal Research:

    • Conducts observations or measurements over an extended period to track changes or developments over time.

These research methods can be applied in various fields and disciplines, depending on the nature of the research question and the goals of the study. Researchers select the most appropriate method based on factors such as the research question, available resources, and the type of data needed.

Virtual Certificate Course in Research Methodology

 The Valedictory Programme of the first batch of the three months virtual Certificate Course in Research Methodology conducted by Knowledge Resource Centre at Indian Institute of Corporate Affairs (IICA) was held on Wednesday. The course had started on July 7th and was conducted online via Blackboard in October. The event was graced by esteemed dignitaries and witnessed the culmination of this unique programme. 

Chief Guest Shri Praveen Kumar acknowledged the corporate institution IICA’s role in the research field and shared his appreciation for the feedback from the participants and the successful completion of the programme’s first batch. Shri Kumar also presented digital certificates to the participants in recognising their achievements.

Prof. Sangeeta Jauhari, the Guest of Honor, Pro V C Rabindranath Tagore University, Bhopal, highlighted the significance of quality research in the current scenario, emphasising its value and impact.

Earlier, moderating the function, Dr. Lata Suresh, Course Director, IICA, introduced all the participants and thanked them for actively participating in the programme. Dr. Suresh also acknowledged the dedication and collective efforts of the esteemed faculty members, including Ms. Oluwatoyin Oyekenu, Dr. Santosh H, Dr. Sumit Narula, Ferdinand, Dr. Adesh Chaturvedi, and Dr. Ameer Hussain.

The students shared their positive feedback on the course, reflecting on their experiences and the knowledge they’ve gained and gathered much knowledge in the field of research methodology and writing research proposals.

The programme concluded with a vote of thanks, expressing gratitude to all attendees, guests, faculty, and participants for their contributions and support in making the First Batch of the Certificate Course in Research Methodology a success.

****

Research Methods Definitions Types and Examples

Research Methods | Definitions, Types, Examples
Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make.
First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:
Qualitative vs. quantitative: Will your data take the form of words or numbers?
Primary vs. secondary: Will you collect original data yourself, or will you use data that has already been collected by someone else?
Descriptive vs. experimental: Will you take measurements of something as it is, or will you perform an experiment?
Second, decide how you will analyze the data.
For quantitative data, you can use statistical analysis methods to test relationships between variables.
For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.
Methods for collecting data
Data is the information that you collect for the purposes of answering your research question. The type of data you need depends on the aims of your research.
Qualitative vs. quantitative data
Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.
For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data.
If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing, collect quantitative data.
Pros Cons
Qualitative
Flexible – you can often adjust your methods as you go to develop new knowledge.
Can be conducted with small samples. Can’t be analyzed statistically, and not generalizable to broader populations.
Difficult to standardize research, at higher risk for research bias.
Quantitative Can be used to systematically describe large collections of things.
Generates reproducible knowledge. Requires statistical training to analyze data.
Requires larger samples.
You can also take a mixed methods approach, where you use both qualitative and quantitative research methods.
Primary vs. secondary research
Primary research is any original data that you collect yourself for the purposes of answering your research question (e.g. through surveys, observations and experiments). Secondary research is data that has already been collected by other researchers (e.g. in a government census or previous scientific studies).
If you are exploring a novel research question, you’ll probably need to collect primary data. But if you want to synthesize existing knowledge, analyze historical trends, or identify patterns on a large scale, secondary data might be a better choice.
Pros Cons
Primary Can be collected to answer your specific research question.
You have control over the sampling and measurement methods. More expensive and time-consuming to collect.
Requires training in data collection methods.
Secondary Easier and faster to access.
You can collect data that spans longer timescales and broader geographical locations. No control over how data was generated.
Requires extra processing to make sure it works for your analysis.
Descriptive vs. experimental data
In descriptive research, you collect data about your study subject without intervening. The validity of your research will depend on your sampling method.
In experimental research, you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design.
To conduct an experiment, you need to be able to vary your independent variable, precisely measure your dependent variable, and control for confounding variables. If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.
Pros Cons
Descriptive
Allows you to describe your research subject without influencing it.
Accessible – you can gather more data on a larger scale. No control over confounding variables.
Can’t establish causality.
Experimental More control over confounding variables.
Can establish causality. You might influence your research subject in unexpected ways.
Usually requires more expertise and resources to collect data.
Research methods for collecting data
Research method Primary or secondary? Qualitative or quantitative? When to use
Experiment
Primary Quantitative To test cause-and-effect relationships.
Survey
Primary
Quantitative To understand general characteristics of a population.
Interview/focus group Primary Qualitative To gain more in-depth understanding of a topic.
Observation
Primary Either To understand how something occurs in its natural setting.
Literature review
Secondary Either To situate your research in an existing body of work, or to evaluate trends within a research topic.
Case study
Either
Either To gain an in-depth understanding of a specific group or context, or when you don’t have the resources for a large study.
Methods for analyzing data
Your data analysis methods will depend on the type of data you collect and how you prepare it for analysis.
Data can often be analyzed both quantitatively and qualitatively. For example, survey responses could be analyzed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.
Qualitative analysis methods
Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that was collected:
From open-ended surveys and interviews, literature reviews, case studies, ethnographies, and other sources that use text rather than numbers.
Using non-probability sampling methods.
Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions and be careful to avoid research bias.
Quantitative analysis methods
Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).
You can use quantitative analysis to interpret data that was collected either:
During an experiment.
Using probability sampling methods.
Because the data is collected and analyzed in a statistically valid way, the results of quantitative analysis can be easily standardized and shared among researchers.
Research methods for analyzing data
Research method Qualitative or quantitative? When to use
Statistical analysis
Quantitative
To analyze data collected in a statistically valid manner (e.g. from experiments, surveys, and observations).
Meta-analysis
Quantitative To statistically analyze the results of a large collection of studies.
Can only be applied to studies that collected data in a statistically valid manner.
Thematic analysis
Qualitative
To analyze data collected from interviews, focus groups, or textual sources.
To understand general themes in the data and how they are communicated.
Content analysis
Either To analyze large volumes of textual or visual data collected from surveys, literature reviews, or other sources.
Can be quantitative (i.e. frequencies of words) or qualitative (i.e. meanings of words).

Methods commonly used to substitute inconsistent values in a data set

There are several methods commonly used to substitute inconsistent values in a data set. Here are a few examples:
Mean/Median Imputation: In this method, the missing or inconsistent values are replaced with the mean or median value of the corresponding variable. This method assumes that the missing values are missing completely at random and replaces them with the central tendency of the variable.
Regression Imputation: In this method, the inconsistent values are replaced by predicting their values based on a regression model. The model is built using the other variables in the data set that are not missing or inconsistent.
Hot Deck Imputation: In this method, the inconsistent values are replaced with values from similar cases in the data set. Similar cases are identified based on a set of matching variables. This method preserves the pattern of the data and replaces missing values with observed values from similar cases.
Multiple Imputation: This method involves creating multiple imputations for the inconsistent values based on statistical models. Multiple imputations allow for uncertainty in the imputed values and provide a range of plausible values for each inconsistent value.
Expert Knowledge: In some cases, domain experts or researchers may have specific knowledge about the data and can manually substitute inconsistent values based on their expertise. This method relies on human judgment and expertise.
It’s important to note that the choice of method for substituting inconsistent values depends on the nature of the data, the extent of inconsistency, and the assumptions made about the missing or inconsistent values. Different methods may be more appropriate in different situations, and it’s essential to consider the limitations and potential biases introduced by the chosen method.
Among the options provided, the correct method to substitute inconsistent values in a data set is “Editing.” Editing involves identifying and correcting inconsistent or erroneous values in the data set. It typically involves a manual or automated review of the data to detect outliers, errors, or inconsistencies. Once identified, the inconsistent values can be corrected or replaced using appropriate methods such as imputation or removal. Coding and elimination, on the other hand, are not specific methods for substituting inconsistent values but can be part of the data cleaning process. Coding refers to assigning numerical codes or categories to represent certain values or variables, while elimination involves removing observations or variables from the data set.

Difference between primary and secondary data

Following points distinguish primary and secondary data:
Image credits © Gaurav Akrani.Meaning, example, and definition,
Data’s originality,
Need of adjustment,
Data sources,
Type of data,
Methods used to collect data,
Obtained data’s reliability,
The time consumed,
Need of investigator,
Cost effectiveness,
When are the data collected?
Capability to solve a problem,
Suitability to meet the requirement,
Bias or personal prejudice,
Who collects the data? And
Precaution before using the data.
Now let’s compare primary and secondary data on the above sixteen points.
1. Meaning, example, and definition
Primary data are fresh (new) information collected for the first time by a researcher himself for a particular purpose. It is a unique, first-hand and qualitative information not published before. It is collected systematically from its place or source of origin by the researcher himself or his appointed agents. It is obtained initially as a result of research efforts taken by a researcher (and his team) with some objective in mind. It helps to solve certain problems concerned with any domain of choice or sphere of interest. Once it is used up for any required purpose, its original character is lost, and it turns into secondary data.
One must note that, even if the data is originally collected by somebody else from its source for his study, but never used then the collected data is called primary data. However, once used it turns into secondary data.
Imagine, you are visiting an unexplored cave to investigate and later recording its minute details to publish, is an example of primary data collection.
Wessel’s definition of primary data,
“Data originally collected in the process of investigation are known as primary data.”
Secondary data, on the other hand, are information already collected by others or somebody else and later used by a researcher (or investigator) to answer their questions in hand. Hence, it is also called second-hand data. It is a ready-made, quantitative information obtained mostly from different published sources like companies’ reports, statistics published by government, etc. Here the required information is extracted from already known works of others (e.g. Published by a subject scholar or an organization, government agency, etc.). It is readily available to a researcher at his desk or place of work.
Assume, you are preparing a brief report on your country’s population for which you take reference of the census published by government, is an example of secondary data collection.
Sir Wessel, defined secondary data in simple words as,
“Data collected by other persons are called secondary data.”
Another definition of secondary data in words of M. M. Blair,
“Secondary data are those which are already in existence and collected for some other purpose than the answering of the question in hand.”
2. Data’s originality
Primary data are collected by a researcher (or investigator) at the place or source of its origin. These are original or unique information.
A researcher (or investigator) does the collection of secondary data from already existing works of others. These are neither originals nor unique information.
3. Need of adjustment
The primary data collection is done to accomplish some fixed objective, and obtained with some focus in mind. Hence, it doesn’t need any prior adjustment before getting used to satisfy the purpose of an inquiry.
Secondary data collected are truly the work of someone else done for some other purposes. It is not focused to meet the objective of the researcher. As a result, it needs to be properly adjusted and arranged before making its actual use. Only after proper adjustment, it can be accustomed to some extend for achieving the aim of a researcher.
4. Data sources
Primary data are collected systematically through following activities:By conducting surveys,
Taking in-depth interviews of respondents (These are individuals who give necessary information to the interviewer),
Through experimentation,
By direct observations,
Ethnographic research (It primarily involves the study of an ethnic group of people and their respective culture),
Focus groups,
Participatory research, etc.
The collection of secondary data is from internal and external published sources.
Internal sources of secondary data are:Company’s accounts,
Sales figures,
Reports and records,
Promotional campaigns’ data,
Customers’ feedback,
Cost information,
Marketing activities, so on.
External sources of secondary data include:Data published by country’s central, state and local governments,
Data even published by foreign governments,
Publications released by international organizations (like the IMF, WHO, ILO, UNO, WWF, etc.) and their subsidiary bodies,
Reports prepared by various commissions and other appointed committees,
Results of research work published by research institutions, universities, subject scholars, economists, etc.,
Books, newspapers, and magazines,
Reports and journals of trade unions, industries, and business associations,
Information released by a central bank, stock exchanges, etc.,
Public libraries,
Archives, Directories, Databases, and Indexes,
Old historical records,
Online websites, blogs, and forums.
Note: Sometimes, though rarely, even unpublished information still available in office records can also be used for secondary data.
5. Type of data
Primary data provide qualitative data. It means it gives information on subjective quality-related features like look, feel, taste, lightness, heaviness, etc., of any object or phenomenon under research or inquiry.
On the contrary, secondary data, provide quantitative data. In other words, it gives information about an object or event in a numerical, statistical and tabulated form like in percentages, lists, tables, etc.
6. Methods used to collect data
Methods used to collect primary data are as follows:Observation, experimentation and interview method,
The direct personal investigation,
The indirect oral-investigation,
Information collected through schedules and questionnaires (sets of questions) via enumerator’s (a survey personnel involved in counting and listing) method and mailing method,
Information obtained from correspondents or local sources,
Some other minor methods:The analysis of the content,
Consumer panels,
Use of mechanical devices,
Pantry audits,
Distributor or store audits,
Projective Techniques (PT),
Warranty cards, etc.
The main methods used to collect secondary data are:Desk research methods,
Search on the Internet,
Going through media generated by consumers and their groups, so on.
7. Obtained data’s reliability
Primary data are more reliable than secondary data. It is because primary data are collected by doing original research and not through secondary sources that may subject to some errors or discrepancies and may even contain out-dated information.
Secondary data are less reliable than primary data. It is so, since, based on research work done by others and not by the researcher himself. Here, verification of published information cannot be always confirmed accurately as all references used may not be available or mentioned in detail.
8. The time consumed
Reliability of primary data comes at the expense of time it consumes. It is because its collection goes through the following steps:First, the researcher makes a sample (i.e. List of respondents to approach).
Then he prepares a questionnaire (i.e. Containing a set of questions to be asked to respondents).
Later, he appoints and trains a team of field interviewers who are supposed to interview the respondents.
Finally, the researcher has to analyze the collected data by interviewers and draw a conclusion from it.
Accomplishment of the above procedure is not a quick task, is a time-consuming one.
On the contrary, collection of secondary data consumes less time compared to primary data. It is because secondary data collection is mostly made without interviews as follows:Here, a researcher relies heavily on ready-made data and collects it from internal and external published sources (see the point no.4).
He depends on already analyzed and concluded data by someone else to get an understanding of his subject topic or research interest.
He doesn’t waste time appointing field interviewers and waiting for their data.
He saves his precious work hours, and, as a result, it takes him less time to collect secondary data.
9. Need of investigator
Collection of primary data needs availability of trained researchers or investigators. Further, they also need to be adequately supervised and controlled.
If the availability of trained investigators and cost involved in hiring them is a problem, then in such a case, secondary methods of data collection are recommended. Its data collection doesn’t need to hire them.
10. Cost effectiveness
Primary data collection needs the appointment of a team that mainly comprises of researchers, field interviewers, data analysts, so on. Hiring of these experts and other additional costs, demands more funds to be allocated to complete research work on time. For this reason, it is a costly affair.
The secondary data collection doesn’t require the appointment of such a team. Here, since no experts hired, cost is minimized. As a result, it is very economical.
11. When are the data collected?
Collection of primary data starts when secondary data seems insufficient to solve problems associated with the research. The researcher first uses secondary data, if he finds that collected information from secondary sources, is inadequate, only then decides to collect primary data.
The secondary data collection is the priority and economical choice for most researchers to solve an identified problem or answer objects of inquiry. Here, most information extraction is done and if some information is unavailable only then a decision to conduct primary research is taken.
12. Capability to solve a problem
Primary data are fresh (new), original (unique), more accurate (almost correct), verified (confirmed), satisfies a requirement (as needed), up-to-date and current (latest). It gives the required information. For this reason, it is more capable of solving a problem.
Secondary data, on the other hand, may be less accurate or riddled with errors or discrepancies, not directly related (inconsistent) and even outdated (not latest). It gives only supporting and not the required information. As a result, it is less capable of solving a problem.
13. Suitability to meet the requirement
Primary data are suitable to meet the objects of inquiry because these are collected using systematic methods.
Collection of secondary data may or may not fulfill the actual requirement of a researcher.
14. Bias or personal prejudice
There is a possibility of personal prejudice or bias creeping in while collecting primary data because of the direct involvement of an investigator.
The possibility of prejudice is absent in secondary data because the information is not collected at first hand and, for this reason, is not subjected to any bias.
15. Who collects the data?
A researcher (an investigator) or his appointed agents collect the primary data.
Anyone, other than those who gather primary data collects secondary data.
16. Precaution before using the data
The primary data collection is done systematically by a researcher himself or his agents as instructed with great care, requirement, planning, organization and followed by verification of the obtained information. It is less likely that such a well-processed data is subject to errors.
For this reason, no extra precautions are necessary while using primary data.
On the other hand, secondary data, since collected by others for different purposes may be inconsistent (not as required), outdated, unverified, subjected to any errors or mistakes, etc. As a result, immense care must be taken while one is considering using it. If used without precaution, it may have an adverse impact on the quality of one’s research and affect its credibility to a great extent.
Conclusion
We can conclude that any data remain data, whether termed as a primary or secondary. What classifies it from one another is the degree of detachment from its source and how it is being collected (whether as first-hand or second-hand) and used.
Any data become primary if it is first gathered by collecting agency, and the same data becomes secondary if it is used later by the rest of the world.
For example, data collected by an election commission are primary for it, and the same set of data is secondary for all except it.
Thus, Secrist lucidly describes this as follows,
“The distinction between primary and secondary data is one of the degrees. Data primary in the hands of one party may be secondary in the hands of others.”
References (10)
References used and suggested reading for deeper understanding:Research Methodology: Methods and Techniques; by C. R. Kothari.
Research Methodology: Data Presentation; by Dr. Y. K. Singh.
Research Methodology: by Dr. C. Rajendar Kumar.
Research Methodology and Statistical Analysis (for M. Com); by S.C. Aggarwal and S.K. Khurana.
Statistics for Economics and Indian economic development; For Class 11; by T. R. Jain and V. K. Ohri.
Statistics for Economics; Class 11; by Dr. D. P. Jain.
International Business; 4th Edition; by Les Dlabay, James Scott.
Marketing Research: An Applied Orientation; 5th Edition; By Naresh K. Malhotra and Satya Bhushan Dash.
Marketing Research: Methodological Foundations; 10th Edition; by Gilbert A. Churchill, Jr and Dawn Iacobucci.
Office Organization and Management; 2nd Edition; by S. P. Arora.