Common Names and terms associated with Statistical Analysis

 Here are some common names and terms associated with statistical analysis:

Descriptive Statistics

  1. Mean – The average of a set of numbers.
  2. Median – The middle value in a list of numbers.
  3. Mode – The most frequently occurring value in a set of numbers.
  4. Range – The difference between the highest and lowest values.
  5. Variance – Measures the dispersion of a set of data points.
  6. Standard Deviation – The square root of the variance, representing the average amount of variability in a set of data.

Inferential Statistics

  1. Population – The entire group that you want to draw conclusions about.
  2. Sample – A subset of the population used to represent the population.
  3. Hypothesis Testing – A method for testing a claim or hypothesis about a parameter in a population.
  4. Confidence Interval – A range of values that is likely to contain the population parameter.
  5. p-value – The probability of observing a test statistic as extreme as, or more extreme than, the value observed under the null hypothesis.
  6. t-test – A statistical test used to compare the means of two groups.
  7. ANOVA (Analysis of Variance) – A statistical method used to compare the means of three or more samples.
  8. Chi-Square Test – A test that measures how expectations compare to actual observed data.

Regression Analysis

  1. Linear Regression – A method to model the relationship between a dependent variable and one or more independent variables.
  2. Multiple Regression – An extension of linear regression that uses multiple independent variables to predict a dependent variable.
  3. Logistic Regression – A regression model used for binary classification.

Correlation

  1. Pearson Correlation – Measures the linear relationship between two continuous variables.
  2. Spearman Rank Correlation – Measures the strength and direction of association between two ranked variables.

Advanced Statistical Methods

  1. Factor Analysis – A method used to identify underlying relationships between variables.
  2. Cluster Analysis – A method used to group similar data points together.
  3. Time Series Analysis – Techniques used to analyze time-ordered data points.

Data Visualization

  1. Histogram – A graphical representation of the distribution of numerical data.
  2. Box Plot – A standardized way of displaying the distribution of data based on a five-number summary.
  3. Scatter Plot – A graph used to display values for typically two variables for a set of data.

Non-parametric Tests

  1. Mann-Whitney U Test – A test used to compare differences between two independent groups.
  2. Wilcoxon Signed-Rank Test – A test used to compare two paired groups.
  3. Kruskal-Wallis Test – An extension of the Mann-Whitney U test for comparing more than two groups.

Research Methods for Using AI Applications in Public Transport

 Researching the implementation of AI applications in public transport involves a multidisciplinary approach, combining data science, engineering, urban planning, and social sciences. Below are some key research methods to explore the potential and impact of AI in public transportation systems:

1. Data Collection and Analysis

a. Sources of Data:

  • Sensors and IoT Devices: Install sensors on vehicles and infrastructure to collect data on traffic patterns, vehicle health, and passenger flow.
  • GPS and Tracking Systems: Use GPS data to monitor vehicle locations and movements.
  • Ticketing Systems: Analyze data from smart ticketing systems to understand passenger usage and behavior.
  • Surveys and Interviews: Conduct surveys and interviews with passengers and transport operators to gather qualitative data.

b. Data Processing and Cleaning:

  • Use data cleaning techniques to handle missing values, outliers, and inconsistencies.
  • Apply data integration methods to combine data from multiple sources.

c. Data Analysis Techniques:

  • Descriptive Analytics: Summarize the main characteristics of the data.
  • Predictive Analytics: Use machine learning algorithms to predict future trends and potential issues.
  • Prescriptive Analytics: Develop optimization models to suggest actions based on predictive analytics.

2. Machine Learning and AI Modeling

a. Algorithm Selection:

  • Supervised Learning: For tasks like predictive maintenance and demand forecasting.
  • Unsupervised Learning: For clustering passenger data and identifying patterns.
  • Reinforcement Learning: For optimizing traffic management and route planning.

b. Model Training and Validation:

  • Split the data into training and testing sets.
  • Use cross-validation techniques to ensure model robustness.
  • Evaluate model performance using metrics like accuracy, precision, recall, and F1 score.

c. Model Deployment:

  • Develop scalable architectures for real-time data processing.
  • Implement continuous learning systems that update models based on new data.

3. Simulation and Modeling

a. Traffic Simulation:

  • Use traffic simulation software (e.g., SUMO, MATSim) to model the impact of AI-driven traffic management systems.
  • Simulate different scenarios to evaluate the effectiveness of AI interventions.

b. Autonomous Vehicle Testing:

  • Conduct controlled field tests of autonomous buses and trains.
  • Use virtual environments to test AI algorithms in a variety of conditions before real-world deployment.

c. Scenario Analysis:

  • Develop scenarios to understand the impact of AI on different aspects of public transport, such as safety, efficiency, and passenger satisfaction.

4. Human Factors and Usability Studies

a. User Experience (UX) Research:

  • Conduct usability testing of AI-powered ticketing and information systems.
  • Gather feedback from passengers to improve user interfaces and interaction designs.

b. Acceptance and Adoption Studies:

  • Use surveys and focus groups to understand public perception and acceptance of AI technologies in public transport.
  • Analyze the factors that influence the adoption of AI applications among different demographic groups.

c. Accessibility Evaluation:

  • Assess the accessibility of AI applications for passengers with disabilities.
  • Ensure that AI systems are inclusive and cater to the needs of all users.

5. Impact Assessment and Evaluation

a. Economic Analysis:

  • Conduct cost-benefit analysis to evaluate the financial viability of AI applications.
  • Analyze the impact of AI on operational costs, revenue, and economic growth.

b. Environmental Impact:

  • Measure the impact of AI applications on energy consumption and emissions.
  • Evaluate the potential of AI to contribute to sustainable transport solutions.

c. Social Impact:

  • Assess the impact of AI on job roles and employment in the public transport sector.
  • Study the broader social implications, including equity and accessibility issues.

6. Policy and Regulatory Studies

a. Regulatory Framework Analysis:

  • Study existing regulations related to AI and public transport.
  • Propose policy recommendations to facilitate the safe and effective deployment of AI technologies.

b. Ethical Considerations:

  • Investigate the ethical implications of AI applications, such as privacy concerns and data security.
  • Develop guidelines for ethical AI use in public transport.

c. Stakeholder Analysis:

  • Identify and analyze the roles of various stakeholders, including government agencies, transport operators, and passengers.
  • Develop strategies for stakeholder engagement and collaboration.

By employing these research methods, researchers can gain a comprehensive understanding of how AI can be effectively integrated into public transport systems, addressing technical, social, economic, and ethical challenges.

TOPSIS Research Method

 TOPSIS, which stands for Technique for Order Preference by Similarity to Ideal Solution, is a multi-criteria decision-making method used to identify the best alternative among a set of options. It is widely employed in various fields, including management, engineering, environmental sciences, and finance.

Here’s a description of the TOPSIS method:

  1. Identifying Criteria: First, you need to define the criteria or attributes that are relevant to your decision-making problem. These criteria should be measurable and contribute to evaluating the alternatives.

  2. Normalization: Once the criteria are identified, the next step involves normalizing the decision matrix. Normalization ensures that all criteria are on the same scale and have equal importance. This step is crucial to prevent bias towards any particular criterion.

  3. Weight Assignment: After normalization, weights are assigned to each criterion to reflect their relative importance. The weights are typically determined based on the preferences of decision-makers or through analytical methods such as the Analytic Hierarchy Process (AHP) or the Analytic Network Process (ANP).

  4. Ideal and Anti-Ideal Solutions: In TOPSIS, two reference points are used: the ideal solution and the anti-ideal solution. The ideal solution represents the best performance for each criterion, while the anti-ideal solution represents the worst performance. These solutions are constructed based on maximizing or minimizing each criterion.

  5. Calculating Proximity to Ideal Solution: For each alternative, the distance to the ideal solution and the distance to the anti-ideal solution are calculated. Typically, Euclidean distance or other distance metrics are used for this purpose.

  6. Relative Closeness to Ideal Solution: Once the distances are calculated, a relative closeness to the ideal solution is determined for each alternative. This is done by comparing the distances and calculating a score that indicates how close each alternative is to the ideal solution.

  7. Ranking Alternatives: Finally, the alternatives are ranked based on their relative closeness to the ideal solution. The alternative with the highest closeness score is considered the best option.

TOPSIS provides a systematic and structured approach to decision-making, allowing decision-makers to consider multiple criteria simultaneously and select the most suitable alternative. It is intuitive, easy to implement, and can accommodate both quantitative and qualitative criteria. However, it is essential to ensure that the criteria and their weights accurately reflect the decision-maker’s preferences and objectives for the method to yield meaningful results.

Different Research Methodologies

Research methods can be broadly categorized into two main types: qualitative and quantitative. Each type has its own set of techniques and approaches, and researchers often use a combination of both methods to gain a more comprehensive understanding of a research question.

Here’s an overview of each type:

  1. Quantitative Research Methods:

    • Experimental Research: Involves manipulating one or more independent variables to observe their effect on a dependent variable while controlling for other factors.
    • Survey Research: Uses questionnaires or interviews to gather data from a sample of participants, aiming to generalize findings to a larger population.
    • Observational Research: Involves systematically observing and recording behavior in its natural context without interference. Researchers may use structured or unstructured observations.
  2. Qualitative Research Methods:

    • Interviews: Conducting in-depth, semi-structured or unstructured interviews with participants to gather rich, detailed information about their experiences, opinions, and perspectives.
    • Focus Groups: Involves a group discussion led by a facilitator to explore a specific set of issues. It helps researchers understand group dynamics and diverse perspectives.
    • Case Study Research: In-depth investigation of a single individual, group, event, or situation to gain a holistic understanding and generate detailed, context-specific knowledge.
    • Ethnographic Research: Involves immersing the researcher in the culture or community being studied to gain a deep understanding of social phenomena in their natural context.
  3. Mixed-Methods Research:

    • Combines both quantitative and qualitative research methods within a single study to provide a more comprehensive understanding of the research question.
  4. Action Research:

    • Involves collaboration between researchers and practitioners to solve real-world problems. The process is iterative, with researchers and participants working together to implement and evaluate solutions.
  5. Meta-Analysis:

    • Involves the statistical analysis of existing research findings from multiple studies on a particular topic to draw generalizable conclusions.
  6. Longitudinal Research:

    • Conducts observations or measurements over an extended period to track changes or developments over time.

These research methods can be applied in various fields and disciplines, depending on the nature of the research question and the goals of the study. Researchers select the most appropriate method based on factors such as the research question, available resources, and the type of data needed.

Research Methods Definitions Types and Examples

Research Methods | Definitions, Types, Examples
Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral part of your research design. When planning your methods, there are two key decisions you will make.
First, decide how you will collect data. Your methods depend on what type of data you need to answer your research question:
Qualitative vs. quantitative: Will your data take the form of words or numbers?
Primary vs. secondary: Will you collect original data yourself, or will you use data that has already been collected by someone else?
Descriptive vs. experimental: Will you take measurements of something as it is, or will you perform an experiment?
Second, decide how you will analyze the data.
For quantitative data, you can use statistical analysis methods to test relationships between variables.
For qualitative data, you can use methods such as thematic analysis to interpret patterns and meanings in the data.
Methods for collecting data
Data is the information that you collect for the purposes of answering your research question. The type of data you need depends on the aims of your research.
Qualitative vs. quantitative data
Your choice of qualitative or quantitative data collection depends on the type of knowledge you want to develop.
For questions about ideas, experiences and meanings, or to study something that can’t be described numerically, collect qualitative data.
If you want to develop a more mechanistic understanding of a topic, or your research involves hypothesis testing, collect quantitative data.
Pros Cons
Qualitative
Flexible – you can often adjust your methods as you go to develop new knowledge.
Can be conducted with small samples. Can’t be analyzed statistically, and not generalizable to broader populations.
Difficult to standardize research, at higher risk for research bias.
Quantitative Can be used to systematically describe large collections of things.
Generates reproducible knowledge. Requires statistical training to analyze data.
Requires larger samples.
You can also take a mixed methods approach, where you use both qualitative and quantitative research methods.
Primary vs. secondary research
Primary research is any original data that you collect yourself for the purposes of answering your research question (e.g. through surveys, observations and experiments). Secondary research is data that has already been collected by other researchers (e.g. in a government census or previous scientific studies).
If you are exploring a novel research question, you’ll probably need to collect primary data. But if you want to synthesize existing knowledge, analyze historical trends, or identify patterns on a large scale, secondary data might be a better choice.
Pros Cons
Primary Can be collected to answer your specific research question.
You have control over the sampling and measurement methods. More expensive and time-consuming to collect.
Requires training in data collection methods.
Secondary Easier and faster to access.
You can collect data that spans longer timescales and broader geographical locations. No control over how data was generated.
Requires extra processing to make sure it works for your analysis.
Descriptive vs. experimental data
In descriptive research, you collect data about your study subject without intervening. The validity of your research will depend on your sampling method.
In experimental research, you systematically intervene in a process and measure the outcome. The validity of your research will depend on your experimental design.
To conduct an experiment, you need to be able to vary your independent variable, precisely measure your dependent variable, and control for confounding variables. If it’s practically and ethically possible, this method is the best choice for answering questions about cause and effect.
Pros Cons
Descriptive
Allows you to describe your research subject without influencing it.
Accessible – you can gather more data on a larger scale. No control over confounding variables.
Can’t establish causality.
Experimental More control over confounding variables.
Can establish causality. You might influence your research subject in unexpected ways.
Usually requires more expertise and resources to collect data.
Research methods for collecting data
Research method Primary or secondary? Qualitative or quantitative? When to use
Experiment
Primary Quantitative To test cause-and-effect relationships.
Survey
Primary
Quantitative To understand general characteristics of a population.
Interview/focus group Primary Qualitative To gain more in-depth understanding of a topic.
Observation
Primary Either To understand how something occurs in its natural setting.
Literature review
Secondary Either To situate your research in an existing body of work, or to evaluate trends within a research topic.
Case study
Either
Either To gain an in-depth understanding of a specific group or context, or when you don’t have the resources for a large study.
Methods for analyzing data
Your data analysis methods will depend on the type of data you collect and how you prepare it for analysis.
Data can often be analyzed both quantitatively and qualitatively. For example, survey responses could be analyzed qualitatively by studying the meanings of responses or quantitatively by studying the frequencies of responses.
Qualitative analysis methods
Qualitative analysis is used to understand words, ideas, and experiences. You can use it to interpret data that was collected:
From open-ended surveys and interviews, literature reviews, case studies, ethnographies, and other sources that use text rather than numbers.
Using non-probability sampling methods.
Qualitative analysis tends to be quite flexible and relies on the researcher’s judgement, so you have to reflect carefully on your choices and assumptions and be careful to avoid research bias.
Quantitative analysis methods
Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive studies) or cause-and-effect relationships (in experiments).
You can use quantitative analysis to interpret data that was collected either:
During an experiment.
Using probability sampling methods.
Because the data is collected and analyzed in a statistically valid way, the results of quantitative analysis can be easily standardized and shared among researchers.
Research methods for analyzing data
Research method Qualitative or quantitative? When to use
Statistical analysis
Quantitative
To analyze data collected in a statistically valid manner (e.g. from experiments, surveys, and observations).
Meta-analysis
Quantitative To statistically analyze the results of a large collection of studies.
Can only be applied to studies that collected data in a statistically valid manner.
Thematic analysis
Qualitative
To analyze data collected from interviews, focus groups, or textual sources.
To understand general themes in the data and how they are communicated.
Content analysis
Either To analyze large volumes of textual or visual data collected from surveys, literature reviews, or other sources.
Can be quantitative (i.e. frequencies of words) or qualitative (i.e. meanings of words).

Methods commonly used to substitute inconsistent values in a data set

There are several methods commonly used to substitute inconsistent values in a data set. Here are a few examples:
Mean/Median Imputation: In this method, the missing or inconsistent values are replaced with the mean or median value of the corresponding variable. This method assumes that the missing values are missing completely at random and replaces them with the central tendency of the variable.
Regression Imputation: In this method, the inconsistent values are replaced by predicting their values based on a regression model. The model is built using the other variables in the data set that are not missing or inconsistent.
Hot Deck Imputation: In this method, the inconsistent values are replaced with values from similar cases in the data set. Similar cases are identified based on a set of matching variables. This method preserves the pattern of the data and replaces missing values with observed values from similar cases.
Multiple Imputation: This method involves creating multiple imputations for the inconsistent values based on statistical models. Multiple imputations allow for uncertainty in the imputed values and provide a range of plausible values for each inconsistent value.
Expert Knowledge: In some cases, domain experts or researchers may have specific knowledge about the data and can manually substitute inconsistent values based on their expertise. This method relies on human judgment and expertise.
It’s important to note that the choice of method for substituting inconsistent values depends on the nature of the data, the extent of inconsistency, and the assumptions made about the missing or inconsistent values. Different methods may be more appropriate in different situations, and it’s essential to consider the limitations and potential biases introduced by the chosen method.
Among the options provided, the correct method to substitute inconsistent values in a data set is “Editing.” Editing involves identifying and correcting inconsistent or erroneous values in the data set. It typically involves a manual or automated review of the data to detect outliers, errors, or inconsistencies. Once identified, the inconsistent values can be corrected or replaced using appropriate methods such as imputation or removal. Coding and elimination, on the other hand, are not specific methods for substituting inconsistent values but can be part of the data cleaning process. Coding refers to assigning numerical codes or categories to represent certain values or variables, while elimination involves removing observations or variables from the data set.