Why your Business needs to start collecting Data right away?

Data analysis is number 1 requirement to start any data driven project. When BSUPERIOR SYSTEM wants to create application by using data, we need to make sure that it is good enough and it is big enough data. What does it mean? In this article, we will go through some steps that you can face on the typical meeting with programmers and software developers as well as why we at BSUPERIOR SYSTEM recommend for your business to start collecting data as soon as possible.


First step is a good question. That is essential part of any project and there are 2 main types of questions:

  • Prediction – a good example of this would be Real-estate industry – we have features of a Realestate Property and we want predict the price of this property by using the collected Data. With Prediction we have to find the best solution to get from point A (data that we have) to point B (what we want to predict) with least amount of possible errors.
  • Classification – examples: identify if there is a man, dog, or cat on the picture; mark written review as positive or negative; identify client segment based on purchase information. In this case, we have a set of things that we want to know. Important thing here is to create exclusive classes. Imagine working with example of classifying a cat or man on the picture and getting 80% of the pictures where man holding a cat. It is not impossible, but require much more efforts to implement.


In order to keep everything grounded, we will go through my research project “Identify Dividend Aristocrats”. This is a classification problem, I want to show machine learning some data and want it to figure out if that firm can be considered dividend aristocrat or not. Why would we be interested? Because they are great to invest in (see picture below).


Now, we know the problem, we can start looking for data to solve it. My idea was that it is possible to make some progress with financial statements alone. Therefore, we show application financial statements and some basic valuations and receive the answer.


Of course, financial statements alone would not be sufficient to get the answer with high accuracy, but here are couple of factors to consider: this is just the first step; we end up with proof of concept; tool with low precision can still be useful with high level of recall .


Recall and precision are 2 important words, and pretty easy to understand:

  • Precision – is how often we say “Yes” right. In our case if application say that someone is dividend Aristocrat, it is “Yes”.
  • Recall – is how often we say “No” right. If we say “No”, not a dividend aristocrat.


It is tough to have high precision, since there are obviously more factors involved. But high recall would allow us to have a tool that we can use to help us filter unwanted firms from the search. Since we are looking on easy to access data (simply financial statements), it would be easy to get new data, when we want to reuse project in the future. And obviously it is possible to add more factors (financial and non-financial) in the future to increase precision.


Big difference would be the fact that we will have something to compare our progress to, so we actually know if factors, that we add are relevant.


Finally, we want to find the data. In this case, we are dealing with something publicly available (Kaggle). That is extremely important to remember. Human kind is generating massive amount of data. 90% of it was generated in recent years (IBM). Moreover, a lot of time we can enhance performance with something that is publicly available.


However, many companies will be stuck on this last stage. Public data can be used, but something specific to your firm will always be required. If you just start thinking, “what can be beneficial for my firm to predict?”, “what sort of thing that I am (or someone) doing regularly can be recorded to support it?”.


After that data engineer can start applying different machine learning models to data. Many times the only way to see how something performs is to implement it, so that process can be lengthy, we will go over report samples, and how exactly do we know if the model we have is reliable in details.


In conclusion, there is no work around data collection. It is lengthy period and with more and more applications getting AI based and data driven, firms that have this data ready will have significant advantage. Let us know what are your thoughts? Feel free to contact us and let’s have chat on what do you want to predict?


Next week we will get deeper in machine learning concepts and address deep learning and neural nets.

What is Machine Learning and How does it relate to small business?

What is Machine Learning?

Machine Learning can be viewed as developing an application with minimum programming. Everyday apps that user use on their phones, Tablets and laptops developed in the following process:

  • Someone come up with idea for the application.
  • S/He call someone who can do programming (write his idea in the language that computer understand).
  • S/He describes set of steps that application needs to work. For example, to check if our financial statements are correct; we take assets subtract liabilities and equity and we should have zero.
  • We continue to explain more and more rules, and have the system that behaves in the predictable way.

With Machine Learning it is different! Without getting into any technical explanation here is my view on it for your review. Let’s say you want to teach 2 new employees how to check financial statements. You give one of them  all or many financial statements that you have and tell them “figure it out, there is enough information here”. Then you go to the other employee and explain s/he everything.


We can clearly see how First employee will be inefficient and have harder time to understand financial statements and second employee will have better chance to makes total sense of financial statements. That is exactly the reason why this technology did not happen 10 years ago. Imagine the first employee is now a computer, which do not forget, can process millions of numbers and do not get tired. The only thing we are missing is a lot of data (thanks to internet we have a lot of it for basically everything) and cheap processing power (thanks to gaming industry and bitcoins for cheap and powerful GPUs). And as you noticed, all learning process is just happening on the computer without you or huge team of programmers doing it.


As the result, we can build bigger applications faster and cheaper. There are some issues with accuracy in this approach, but there is an issue with “bugs” with a traditional approach, so this part is remained the same.


Importance of machine learning and AI based projects.

AI and Machine Learning are supported by big players, such as:

  • KPMG – published report “Rize of humans”, where company discuss how increased digital labor will affect job market in next 8 years. Picture bellow is great visual representation of the trend. As we can see, bubble on the bottom-left will be shrinking continuously as automation and robotization will continue to drop in price.
  • Google (Alphabet) – claim to change its strategy from mobile first to AI first during “Google IO 2017”. Google is actually extremely good example, Sergey Brin (co-founder of Google) gave interview during world economic forum, where he said that google was not considering all technologies that are in core of modern techniques seriously fairly recently.
KPMG AI and Machine Learning Stat.
Rize of humans

It is easy to imagine that all these technologies demand astronomical amount of investments, and only accessible to huge firms. My favorite example is how Japanese cucumber farmer created sorting mechanism for himself.


What is the point of Machine Learning For Business?

In our coming series of articles, we will go through actual projects within different industries. Our goal will be to explain results that you can have (what will be in your report, what is possible or not possible to do). We will tailor all material to be an easy read for everyone.


After seeing some examples, you will have better understanding of what is going in AI field. Coming up next is description of data research. When any data driven project is about to start engineer needs to go through the data, what is happening and why it might take up to 90% of time consumed on the project is coming up next week.


Please comment and let us know what you are interested in, what subject or industry do you want to visit in better depth. Share, comment or contact us in any way you like.