Mayank Kejriwal, Research Assistant Professor at the University of Southern California
Hi, Mayank. Thank you for agreeing to share your story with us today. We’re very excited to hear more about your data science professional experience. But let’s get to know you better first. Could you briefly introduce yourself to our readers?
Hi, thanks for having me. My name is Mayank Kejriwal. I am a research assistant professor at the University of Southern California’s Department of Industrial and Systems Engineering. I am also a research lead at the USC Information Sciences Institute (ISI).
Can you tell us a bit more about ISI?
I'll be happy to. ISI is a cutting-edge research institute based in Marina del Rey, California. It has more than four decades of history behind it. One of the institute’s crowning early achievements was the invention and development of DNS. In fact, it underlies much of the modern Web’s addressing system. I work in ISI’s AI division, with a primary focus on knowledge graphs. Before joining ISI in 2016, I obtained my Ph.D. from the University of Texas of Austin in Computer Science. Prior to that, I obtained my undergraduate degrees in business and computer engineering. I’ve also had an internship at organizations in organizations like Rackspace, CareerBuilder and the National Center for Supercomputing Applications (NCSA).
That is quite a serious background, Mayank. There’s been some talk recently of movements like ‘AI for social good’ or ‘AI in society’? What are these in practice? Why are they different from the more ‘commercial’ AI that also arguably has ‘social’ impact?
Aside from the name, ‘AI for social good’ (as the movement is now popularly known, along with the pithier ‘AI in society’) is about applying and researching AI in the here and now for solving social problems.
The here-and-now part of this is where AI for social good departs from science fiction and futurism.
We all know that the press talks a lot about AI. And there’s a lot of speculation in those arguments. Some of it is like science fiction. There are claims that many of the experts know are not happening any time soon. And there is no consensus even on when or if they will happen. State-of-the-art chatbots still don’t have anything resembling ’common sense’ for example, despite their training on the most powerful neural networks with data points from millions of individuals. But AI for social good is not futurism. In fact, many of the leading figures are already using the current technology, with all its strengths and weaknesses, to solve real problems like human trafficking or disaster response at some scale. That’s why the field is so exciting and rewarding to be in right now.
In that line of thought, you are currently doing research focused namely on developing AI technologies for social good. What inspired you to contribute to the study in the first place?
I didn’t really make a conscious decision to develop AI for social good. But I’m grateful it came into my life. One of my first projects at the Information Sciences Institute was DIG (Domain-specific Insight Graphs), for which I built the search engine.
Can you share more details about the project?
DIG, funded by DARPA, is a search technology that uses AI and knowledge graphs to allow non-technical subject matter experts to build search engines for specific domains and use-cases like fighting human trafficking or investigating securities fraud. Because of my work on DIG, I had a rare chance to interact with investigators from human trafficking units in New York and San Francisco and learned a lot about the extent of the problem. I’ve come to realize that while technology can’t fully solve such problems, it can be extremely useful if delivered to the right people in the right way. Since then, I’ve been exploring ways to apply AI to solve more social problems.
Is that the highlight of your career you’re most proud of so far?
Yes, our work on human trafficking is the one I’m most proud of. The results of rolling out AI tools to law enforcement are clearly illustrated by some of the impact details that have been released by these agencies. For example, in the district of New York, where this technology was first rolled out, the percentage of prostitution arrests that are now being investigated for human trafficking has gone from less than 1% to more than 62% (since tool roll-out). At the same time, the number of arrests has gone down. In short, the victims get victimized less and the victimizers have more probability of being investigated.
When I see the human face behind what these numbers mean, and we have been lucky in a few instances to have obtained such details, it truly brings home the power of technology in society. It’s the human impact I’m most proud of.
And rightly so. I believe this is one of the most important aspects of AI, too. But it is yet to be utilized fully. So, hopefully, there will be more developments in that direction in the future.
Mayank, you also have more than 15 honors and awards. Can you tell us which one is closest to your heart and why?
The international best dissertation award that I received from the Semantic Web community is very special. It was the first important award that I received. And for a young researcher, support and recognition from the research community hold a special meaning. At the time, my thesis was a little different from what the community was used to. So, I didn’t know how they would react to it. So it was very gratifying to receive that recognition. Some of the later awards, especially for the human trafficking work, were also very special, but this is the one that comes to mind.
Mayank, you are an experienced full-time researcher now. You also have experience as a teaching assistant. Moreover, you were working as a Graduate Research Assistant at the University of Texas at Austin until 2016. In your opinion, are there any gaps in college education in terms of preparing students for a career in data science? And what's missing from their curriculum and practical experience?
The hiring process, even as it's already more automated through the use of resume filters and other such technologies, has resulted in more organic career paths for many data science candidates.
There are success stories of people who have never gone to college, or who went to college in a different field and succeeded still as data scientists via an alternate route like self-learned coding skills or through a bootcamp.
At the same time, the formal structure and prestige of a good college education simply can’t be denied by the numbers and data.
There are three things I think that colleges can do to keep its candidates attractive and viable in a job market that will tighten at some point, and is already under assault from alternative routes. First, require more technical acumen in-class assignments and projects to ensure students are applying what they’re learning, rather than engaging in a ‘check the box’-style engagement. Second, actively start prepping students for internships and interviews that are so necessary today for good full-time positions upon graduation. And third, make a networking and communications class compulsory for the students.
Mayank, in your opinion, what makes the latter point so important?
I think it’s essential for colleges to drive home the message that technical skills are only one required set of skills necessary for succeeding as a data scientist.
Without communication skills, it will be difficult for the data scientist to rise up in the ranks and hold her own in leadership positions.
I’ve seen many students, both graduate and undergraduate, have sub-par communication skills, and if we don’t teach them, no one will. In business schools, they actively support and prepare a lot of their first-year students to succeed in getting good internships in the first summer, and I think we need more aggressive internship prep for engineering and computer science students as well.
Couldn’t agree more. Let’s talk a little bit about your role as a conference speaker, Mayank. Why is it important for aspiring data scientists to attend conferences in their field?
The bull’s eye for a conference attendee is to find the talks where the speaker and talk are both good. This is why conference branding is important. Some conferences are able to just attract much better speakers than others, and both the attendees and speakers know it. So, when aspiring data scientists go to good conferences, they get what’s new and hot straight from the horse’s mouth. They also learn valuable communication skills, albeit indirectly (through observing rather than doing) since the best speakers leave a strong impact, while weaker speakers are forgettable. Perhaps even more importantly, a conference brings together a community of practitioners, and there is always a lot of energy to go around.
Are conferences a good place to upgrade your technical skills?
In my view, the typical conference is not the best place to obtain or focus on technical knowledge (with some exceptions, since some conferences may be more geared specifically towards hacking). You can always pick up the technical skills later, and you’re going to need more time anyways if you want to be good at it.
Speaking of skills, Python is one of the most popular programming languages at the moment. Can you think of an exciting Python project you have worked on?
We do almost all of our work in Python. For example, the DIG project I mentioned earlier - almost entirely in Python. In addition, the search engine I built interfaces with an Elasticsearch backend to conduct fast indexing and search. Other than that, I love the networkx package for building and studying social networks, which has proven to be essential in our human trafficking work.
Mayank, we like to finish our interviews with some tips or advice. What are the most valuable lessons you’ve learned in your career so far?
No matter where you go or what you do (with very little exception), you’re going to be dealing with people. People interact with technology in different ways and need technology for different things. You can learn the most valuable lessons only by talking to different people about their needs. As technologists, we tend to stake our claims on solutions rather than problems. And we’re still very driven by features and automation. But the most valuable lesson I learned is to start from the problem and ask myself some cold, hard questions. What is the simplest possible solution to this problem and why isn’t it enough? Is it possible that I’m biased towards solution X or Y because I want X or Y to succeed as opposed to just solving the problem in the most efficient way possible?