Machine Learning part 3

A Computer Science student wrote to me: “You wrote that Artificial Intelligence and Machine Learning are the high demand skills but my school does not teach these subjects. Our CS program is still focusing mostly on programming languages such as Java, C++, and JavaScript. I am disappointed but do not know what to do. Please advise.”

Answer: Machine learning is a fast-growing field today and many companies are looking to find college graduates that have this skill. If your school does not have a course in this subject, you can learn from Massive Open Online Courses (MOOCs) on the Internet. As a CS student, you need to be active in your learning and find a way to learn what you need instead of waiting for the school to teach the subject.

Image: Internet

If you want to learn machine learning, I suggest that you begin by taking an “Introduction course” to understand what machine learning can and cannot do. A general knowledge about this field will help you to go further in your learning and develop your skills. Many students often skip the introduction course but jump to the implementation part or learning how to use tools. It is a mistake as they only learn the “tricks” but do not know how to apply machine learning efficiently. In that case, they will not go far in their career. I often told my students: “learning the tools but not the principles is like eating “Instant noodle” but not “Gourmet foods.”

Before you start to work on the machine learning algorithms, you need to understand what problems you are solving. You need to ask many questions to make sure that you know what are the problems that you want to solve. By knowing them, you can set your parameters correctly and create a priority to answer each question at a time so you will not get overwhelmed by data and become confused. By knowing which question that needs to be answered with data analysis, you can identify the data needed to collect so you can gather them as well as the amount of that data to support your machine learning algorithms.

The next step is to train the machine to learn and make sure that you have enough data for it to learn. Without sufficient data, your machine learning will not give you the performance that you want. Do not hurry to get the algorithms to work unless you have enough data. There are a lot of public data available on the Internet that you can use but stay focus on what you want to do. Do not try to apply machine learning algorithms to every data you can get, it will overwhelm you. Only get the data that are relevant to your questions to see if your machine learning yields the results as expected. Do not worry about making mistake, most of my students are also making mistakes during this step but they also learn from their mistakes and you should do the same. Once you can master this step and get the machine learning to predict according to your goals then you can move on to complex algorithms.

After developing the machine learning skills and know how to solve problems. The next step is to “play” with commercial tools available on the Internet. Amazon’s AWS, Microsoft’s Azure websites have many tools for data analytics that you can learn. You can use their data to get familiar with the tools then use your own data to train the tools to get it to do what you need. Machines learning is only as good as the data that you give them. To build your skills in machine learning technology, you need to start with the best data on which the machine can learn.

Sources

  • Blogs of Prof. John Vu, Carnegie Mellon University