In data analysis, the first step is to ensure the collection of clear and accurate data. We obtained data on %diabetics, %inactivity, and %obesity from a project sheet and employed Python’s NumPy for essential statistical computations, such as medians, means, and standard deviations. These calculations provide us with a foundational understanding of the dataset.
Our primary objective was to unveil the relationship between the percentage of diabetics (%diabetics) and the percentage of inactive individuals (%inactivity). To achieve this, we constructed a scatterplot representing each region as a data point. This visual aid played a crucial role in assessing the connection between these two variables. Subsequently, we utilized the scatterplot to compute the R-squared value, a metric that quantifies the strength of this relationship. A higher R-squared value signifies a more robust connection, potentially shedding light on the significant contribution of inactivity to diabetes rates. We also meticulously examined residuals to validate our model and ensure the absence of anomalies or outliers.
Furthermore, through the inclusion of histograms and density plots. These visualizations provided valuable insights into how the data was distributed across our dataset. With these powerful analytical tools, we aimed to gain a comprehensive understanding of the intricate relationship between the percentage of diabetics and the percentage of inactive individuals. In essence, our systematic approach encompassed precise data collection, thorough statistical analysis using NumPy, and insightful visualizations, all contributing to unraveling the connection between %diabetics and %inactivity and enhancing our comprehension of diabetes rates.