The woods are lovely, dark and deep. But I have miles to code, so I can’t sleep.
To tell your PC what it must do, you have to speak its language. Unlike humans, languages understood by machines are highly structured and well-defined. Every instruction given to your computer has to be a mathematical or logical statement: be it performing mathematical operations, or telling it where to start and where to go when it reaches some point, or whether it has to perform an operation over and over a few times.
More than just a hobby?
For many people, the journey of writing cryptic instructions begins in high school. They follow a textbook, full with colored text and grayscale images, which gives end to end instructions on developing some toy programs in C++. Any experienced programmer will agree that this is equivalent to barely raising your foot for your first step into this multidimensional world. While most people get into competitions and hackathons or take this up as a career, many people confide about programming being their hobby.
Truth be told, it is no less than art: a whole new way of thinking about the world around you. It is an amazing experience to read beautiful, comprehensible code as compared to a disheveled mess of characters with no spacing between or after them. The way things work in a code is heavily dependent on the way data is stored during processes; it is so important that entire courses in CSE curricula are dedicated to efficient storage and flow of data. Practitioners can spend weeks and months deciding the structure of a program, before typing even the first line. And even then, not every code runs on the first go. When hours of debugging fixes one error, more of them pop up. A working, well optimized code for an end to end project is no less than a masterpiece.
Passion, dedication and patience (along with just enthu) are necessary to reach high levels of proficiency and satisfaction. It is normal to find the walk down this road tough and tiring – because it is.
So what does my PC understand?
Not surprisingly, there would be a large communication gap between you and your machine if you went down to learn the exact language it understands. All your instructions are seen by your PC as a sequence of 1’s and 0’s. These 0’s and 1’s are called “bits”, where 1 signifies presence of a chunk of information and 0 signifies absence, in a way. It’s a herculean task to give off 0’s and 1’s as instructions on our own, so we have Assembly Language. It has more English in it, and some instructions make sense now. However, this language would require you to know how your I/O bus and registers interact, which is quite painstaking again. So, we moved on to High-Level Languages. Humans can use and understand these languages quite easily, which easens code generation and debugging (which means finding errors and correcting them). They include modern languages like C, C++, Python, C#, Java, R, etc. Some languages are built over its predecessors, but with added functionality.
Source: The Bit Theories, https://thebittheories.com/levels-of-programming-languages-b6a38a68c0f2
Choose your language wisely!
Since this article is in continuation with the Artificial Intelligence and Machine Learning Primer, we will look at languages suitable for such applications. As such, any machine learning model is a series of mathematical (often involving statistics and calculus) operations, which should make their implementation in any language possible. Our focus, however, is to get results from our data and make inferences; we will not waste our time reinventing the wheel.
By far, Python is considered the most popular language for general machine learning applications. Its popularity comes from its versatility, intuitive syntax (which is the language’s grammar, to put it simply), vast functionality that it provides and a large online community that can help you out when you’re stuck. Sophisticated software that can help you code and run python programs are available, often free of cost for student use. Many commands required for computations related to matrices and vectors, and statistical quantities have been compiled into bunches of code called libraries. These libraries are open-source and can be used directly by Python. Amazingly, there’s a library out there for almost anything you might want to do with python – be it machine learning, web development, database management or even sending emails.
Another popular language is R, which was developed by R Foundation for Statistical Computing. As the organization’s name suggests, this language is designed specifically for statistical analysis and machine learning applications. It’s syntax is similar to that of Python. It also has a considerably large community, and is often used by professionals. C++ is arguably the most popular Object Oriented Programming Language (OOPL) in modern industries. However, one needs to have good knowledge of data structures and algorithms before this language can be beneficial. In this post, we will further discuss learning Python.
School of Hard Knocks
If you google learning python, you will be shown results, most of which might fall into one of these categories:
- Online courses by websites like Coursera, DataCamp, Udemy, Udacity, edX, etc.
- Books, often title as “Python” or “Python for Data Science” with attractive cover pages published by software engineers
- Articles posted on blogs like Quora, The Medium or Towards Data Science
The best way to proceed varies from person to person. Though most of these paths will fulfill their purpose, we would like to go with something that saves time. For example, online courses might sometimes venture into technicalities not useful for machine learning. Books sometimes describe commands and syntax in such immense detail, that readers might lose motivation. Articles often talk about how someone completed an amazing online course, or have books recommended in them. It is necessary to get fundaes about the books or courses to choose.
My journey of learning python started off when my second year began, with this course by DataCamp. Data Camp delivers video lectures and has its own editor for students to practice on. It’s a great primer for anyone new to Python; at the end, you will have enough working knowledge to run a few programs on your own. I find it difficult to read books to learn languages, so I stayed away from them. Once the foundation is laid, the quickest way (for me) to learn the language further is through, what I may call, targeted querying. Come up with a problem you want to code, and Google that very problem. Here, you will be introduced to websites like Stack Overflow and Geeks for Geeks. They are extremely useful for all programmers, because most of your doubts will have been solved by someone. YouTube is another important source of answer for your targeted queries, with video support. Keep repeating this process with newer problems everyday. Keep adding more weapons to your arsenal this way.
It is important to keep practising and use what you have learnt. Taking long breaks between learning languages will make you forget syntax, just as would happen if you didn’t speak a human language for long. Not very far into your journey, you will realize that efficient use of python libraries are key to learning this language. Keep reading about the most useful libraries of Python, some of which include:
- Numpy : Extremely useful for general computing and linear algebra applications.
- Scipy : A very large library, which can help you with statistics, image processing, audio processing and much more.
- Pandas : An extremely versatile and useful library for data scientists. Makes dealing with data super easy in the form of tables, called Dataframes in its terminology.
- Matplotlib : Data visualization library that helps you make graphs, plots and do image processing. Seaborn and Bokeh are more sophisticated libraries built on this one.
- Scikit-learn (sklearn) : Probably the most widely used library for machine learning applications. Many machine learning projects can be built entirely using only this library.
- Keras (built on Tensorflow) : An easy to use deep learning library that can help you construct perceptron based neural networks (ANN, CNN, LSTM, etc.) easily. Its dependency, Tensorflow, was developed by Google and is a little difficult to learn.
- OpenCV (cv2) and PIL (pillow) : Popular image processing libraries.
There’s a vast universe of libraries out there; you are encouraged to discover those on your own. Google will be immensely useful in your journey, so make good use of it.
Snakes (Python) on the PC
In this last section, we’ll see how we can begin coding on our own PC. For this, we need something called an Integrated Development Environment (IDE). This software is built to help edit, interpret and run your code.
- If you are beginning your journey, start off with Spyder available within Anaconda’s Python 3 distributions. This link will instruct you on the installation procedure. This is a simplistic plug and play IDE where you can write code and run it to get immediate results.
- If you have good knowledge of Python and are ready to work with projects (multiple connected files) instead of individual scripts, consider PyCharm by JetBrains. This is a very powerful IDE with immensely useful features like refactoring, code completion and line by line debugging (don’t worry if this is new to you; they are advanced features). The installation can be confusing, so refer to this video series or ask experienced programmers for help.
- If you move towards data science competitions, you will come across a necessity to run chunks of code separately (for exploring your data), instead of running the entire program all over. For this, Jupyter Notebooks are the best option. Jupyter is available within Anaconda’s distribution, and runs on a web browser.
- Many popular websites which host data science competitions allow you to create Jupyter Notebooks on their site in your account. Kaggle is one such website. The benefit of doing this is that most libraries in their IDEs are pre-installed. Offline installation of some libraries can be frustrating at times. The downside is reduced privacy: it is possible that other users from around the world can view and use your work.
That should get you up and kicking. Remember to keep learning new stuff and practice to not loose touch with your language!
Science Deconstructed is a series which aims to introduce some exciting advancements made by the scientific community in simple terms with guidance on how to pursue these fields in the institute, along with features of research by groups in the institute. Send in your requests for the same at firstname.lastname@example.org. Suggestions/ comments are always welcome.
Series by: Sankalpa Venkatraghavan