As we explore regression analysis, it's time to get hands-on and explore the Ordinary Least Squares (OLS) method. Your upcoming lab report will provide a valuable opportunity to apply the knowledge and skills you acquired in the previous lab session. Now, let's take a look at the tasks that are waiting for you:
Deadlines 📅: Always be aware of the submission date and time. Mark them in your calendar or planner. It’s a good idea to set a reminder a day or two before the due date as an extra precaution. The session deadline is September 26th at 11:59 pm.
Follow the Format 📄: Please ensure that you follow the specified format. This is a particular PDF file with defined sections. You will generate your PDF using the Jupyter Notebook export tool. Furthermore, I recommend that you review the following blog entry regarding the composition of technical reports.
Label your files clearly :
Student1-Student2-LabSession-Date.PDF
(e.g.,
GerardoMarx-HomerSimpson-OLS-Oct30.PDF).
Double-Check Your Work 🔍: Before submission, review your answers. Make sure you’ve completed all the tasks for students. Proofread for spelling and grammatical errors.
Tasks for the OLS Lab Session Report ✅: To ensure you conclude all assigned tasks for this lab session, create a list of Todos and mark each task as completed as you progress. The tasks are listed and detailed below:
☑️ Theoretical Background and Mathematical Procedure: Add a theoretical background section to your report, explaining the concepts behind the OLS method. Utilize LaTeX to present the mathematical procedure for obtaining the OLS parameters ($\theta_0$ and $\theta_1$) clearly and concisely.
☑️ Develop a function in Python to generate data sets using the np.linspace
and
np.random.randn
functions; use fixed theta values as $\theta_0=3.7894$ and $\theta_1=6.7898$. Call the function as
MyData(N,D), where the parameters N and D are the number of points and dispersion, respectively.
☑️ Generate data sets for the following scenarios: - Dataset 1: Varying numbers of points (10, 100, 1,000, 10,000, 100,000, 1,000,000) with a fixed standard deviation $\sigma=1$. Dataset 2: Fixed number of points to $N=30$ and varying standard deviations (the factor that multiplies np.random.randn). The values are $\sigma=$ 0.1, 0.5, 1, 1.5, and 2.
☑️ Develop a function that computes the Sum of Squared Residuals (SSR) based on the formula discussed in lectures: $SSR=\sum{(y-\hat{y})}^2$.
☑️ SSR Evolution Plot: Compute and plot the evolution of SSR using Dataset 1 (number of points vs. SSR) and Dataset 2 (standard deviation vs. SSR).
☑️ Create a function to compute the time effort consumed by both OLS methods (For and NumPy) using Python definitions.
☑️ Time Effort Plot: Compute and plot the time effort by both OLS methods using Dataset 1 for each method.
☑️ Comparison Table of Theta Values: Create a comparison table to show the computed theta values (both methods) against the real thetas for datasets.
☑️ Conclusions: Based on the previous tasks, conclude how the OLS method performs with small and large data sets, dispersion, and time effort, using your previously generated plots as support. Provide insights into the efficiency and accuracy of the methods under various scenarios, including the number of points and dispersion.
☑️ Create a model: develop a predictive model that uses GDP per capita as input to estimate a country’s life satisfaction. Use the data provided in the repository to prepare the dataset, train the model, and then test its performance by plotting the raw data against the model’s predicted values to visually compare the fit.
Use the Correct Submission Method 📥: The link is only to upload your document and is open until 11:59 pm of your deadline. If you need to know how to upload your PDF file, please watch the following video.
Stay Organized 🗂️: Keep all your assignments in a dedicated folder on your computer or in a specific notebook. Consider backing up your work on cloud storage or an external drive.
Seek Clarifications Early ❓: If you’re unsure about any requirements, ask your lecturer well in advance. Don’t wait until the last minute. You can use our Answer-Itmorelia service to post questions and ask for help.
Avoid Procrastination ⏰: Start your homework early. This gives you ample time to research, think through your answers, and ask for help if needed.
Stay Updated 📢: Sometimes, teachers might provide additional instructions or changes. Stay updated by checking your registered email regularly.
Feedback is Gold 🌟: After submitting, if you receive feedback, take it positively. It’s a chance to learn and improve for future assignments.
Support Resources 🧑🏽💻: To complement this session, we provide video resources that review the fundamentals of Ordinary Least Squares (OLS) regression covered previously. These videos explain step by step how the method works, how the model parameters are estimated, and how OLS can be applied to simple datasets. They are designed to reinforce the theoretical concepts with practical demonstrations, helping you connect the mathematical formulation with its implementation in Python.
Finally, everyone occasionally faces challenges with schoolwork. The key is to stay organized, ask for help when needed, and strive constantly for your best. Here's to successful homework submissions and continuous learning!
Happy coding and warm regards,
Gerardo Marx
Lecturer of the Artificial Intelligence and Automation Course,