Conclusion: Employment Analysis
This page focuses on the conclusions and insights found by the application in the field of employment. A deep dive analysis was performed on different sectors of the market along with a focus on various companies (namely Amazon, Microsoft and Google). This page talks about some very interesting questions that were answered throughout this portfolio journey. Since this portfolio has talked a lot about the employment scenario, you must be wondering if you can make the right decision about your future job by the application of the insights found here. Lets find out:
The portfolio’s first attempt is to find what are the must factors/ features that help in predicting the sector of a firm. This has been done using labelled record data. As for text data, the firm itself has been predicted using information related to the salary provided to an employee.
1 Background
Switching a company depends upon many factors such as salary, work-culture, benefits provided, etc. All these reasons depend upon how a company is performing on a day-to-day basis. Given such aspects, different firms belonging to different sectors have different performance statistics. For eg, when a firm is about to go public not only it means that it will launch it’s IPO but the fact that it had revenue to do so. And how was this revenue generated? This depends upon the how much capital the firm gained by providing it’s services or it’s product in the market subtracted by the cost incurred (salaries provided to their employees depending upon size of the company and other miscellaneous expenditures.) The cost incurred highly depends upon which sector the firm belongs to. Hence, keeping this in mind we have decided to predict the sector of a firm using various features from labelled record data and by applying various classification algorithms.
Using text data, we have also predicted the name of the firm depending upon the tweets related to salaries. It’s genuine that no two firms provide the same salary to a candidate that is applying. Perhaps, employees talk differently about each company depending upon the CTC offered, the work-culture and sometimes how huge the firm is. Therefore, using such quotes or tweets, we have depicted the name of the firm by applying again the classification algorithms of machine learning.
2 Setting the Objective
Checking if the sector of a listed job can be predicted based on attributes like Job Title, Company Rating, State, salary range of the firm and the company size (number of employees).
Checking if the past statements related to the firms mentioned above can predict some unlabeled statements and also predict whether an employee should work for an MNC.
3 Results
After applying various ML classification algorithms such as:
- Naive Bayes
- Decision Trees
- SVM
We conclude that for predicting the sector of a firm or in other words, working with labelled record data, Naive-Bayes classification algorithm performs the best with an accuracy of 89.09%.
As for text data, it is pretty interesting to see that SVM performs better than Naive Bayes. One reason could be because of smaller size of the dataset. Similar analysis will be done when access to larger amount of data set pertaining to the tweets is available.
3.1 How the does the size of the company play a role in determining it’s sector?
After looking at the PDF of Min and Max sizes, it can be infered that the plot is completely insightful in terms of the real notions of the size of a firm. In the lower side of the curve where min size is less, the probability is high that it might be a startup and thus the label predicted would be of private. As the min size value keeps on increasing we can see that the prediction is likely to be public. Comparing to real-life scenarios, it’s a known fact that the firm that goes public is generally has a lot of employees whereas a firm that is private has comparitively less number of employees.
After looking at the Decision Tree plot, it can be inferred that the root node is splitting based on the min size value (if min size <= 7500) and belongs to “Private” class. This Decision Tree provides and ideal size of the firms and the range of salaries that they provide based upon the respective splits.
3.2 Do private firms offer more salary to their employees than the public ones?
One interesting thing that can be noted is that for min and max salaries, we get a similar PDF which means that statistically private firms tend to offer more CTC than expected of them in order to attract the employee. We can also confirm this using the Decision Tree provided above.
3.3 Why salary prediction is important ?
Nowadays, one of the major reasons an employee switches the company is the salary of the employee. Employees keep switching companies to get the expected salary and this leads to loss of the company. Due to the competition around, every individual has higher expectations and goals. But the harsh truth is that we cannot randomly provide everyone their desired salary.
This project can help in determining salaries since it leads to reasonable predictions using data transformation and machine learning when given information about certain features. We cannot decide the exact salary but we can predict it by using certain data sets.
3.4 How does the total yearly compensation differ for various companies for a ‘Data Scientist’?
From the above chart, we can witness: 1. Base salaries tend to top out around $250,000, regardless of total yearly compensation. 2. Stock grants and bonuses are highly variable and can comprise a large portion of a Data Scientist’s total yearly compensation. 3. Bonuses tend to more evenly dispersed, while stock grants appear to have three clusters of points (90/50/20 degree angles). This could be a data quality issue, but this could also suggest considerable stock value appreciation.
3.5 WordCloud of Text Data
The word cloud for companies after cleaning data is as follows:
The word cloud for job/designation levels after cleaning data is as follows:
3.6 But what about the location? Does that affect the sectors too?
After looking at the PDF above, we can conclude that the location of the company (State in the case of our dataset) plays a significant role for an employer to setup their headquarters.
4 Conclusion
For our analysis, two labels were chosen:
Private
Public
Using the statistics and the EDA generated, it was found that public firms in comparison to Private are less in number which is quite an obvious fact. Apart from this, firms with company employee count less than 7500 are usually private. Also, the salary ranges are different for both the sectors. Private firms tend to offer more to attract the candidate along with ESOPs and other evaluations.
Based upon the text data, Amazon turns out to be the most talked firm in terms of salary and employment. It is one of the MAANG companies and this stat states that it is quite genuine and generous with the CTC offered.
Given the ongoing layoffs, this project gives a very interesting insight. Depending upon the location, rating, designations (Data Scientist, Analyst and Engineer), salary and size, it is recommended for an employee to look for a Private firm that falls in the mid-size of the company employee strength. Such a firm not only places a good job offer but job security is an extra added perk which is by far one of the most significant factor when deciding upon an offer.