I believe in data and the logic behind it.
I am eager to thrive in any data environment with two powerful friend python and SQL.
I am the one who seeks to study complex data and understand the challenges of accessing it.
MongoDB or traditional databases make no difference to me.
I enjoy looking at and solving big-picture problems and crafting detailed solution wholeheartedly - I like to ask questions and devise a complete solution.
I want to understand the data (not only the pipes), and I can perform statistical and machine learning analytics and build dashboards because I like it. Yes really, because I do.
I know that I don’t know enough, and it bothers me that there isn’t enough time in the day to learn about the next topic.
I don’t sleep well at night when I leave work with a question unanswered.
I feel accountable for everything I do, and that sense of urgency has been driving me my entire life.
I wish to work in a team where I have my team back, and the team has mine.
Santa Clara, CA
Developed an NLP (natural language processing) pipeline for insurance underwriter’s decision automation process. Integrated a dockerized rule-based expert system UMLS (unified medical language system) for feature extraction which greatly improved coverage for disease detection from 40% to 80% Implemented a tfidf and SVM (support vector machine) model for email classification and visualized the confidence scores distribution for unseen categories which beats the production model in terms of overall accuracy and robustness. Designed a statistical brand proximity metric which evaluate the product’s user engagement and efficiency of the system response; A nonprovisional patent being applied in progress
New York, US
Built an end-to-end chatbot assistant to facilitate the company’s hiring process collaborating with other engineers Implemented LSTM, SVM, Tree-based models to upgrade hard-coded dialogue and created a user simulator to generate simulation data for testing Created and maintained the SQL databases on AWS for chatbot to access and retrieve relevant information
Implemented a Deep Learning, powered predicting model of palm tree yield (tree detection, tree counting/density estimation, leaf and soil nutrient analysis, fertilization analysis, age estimation and weather/disasters analysis) on satellite imagery to incorporate the signals in palm oil commodity future trading strategy
Completed independently a research project on electricity user's behavior study with Hausman-Taylor model and test effectiveness of the electricity pricing policy in Shanghai.
Tested and proved the hypothesis that even at a low price difference +/- $0.03/kWh, people under the non-flat rate policy tend to use 12% less after 22:00 (peak hour)
Used Hausman-Taylor model to exclude the unobserved individual effect and successfully measured price elasticities
Air pollution is causing 48000 French deaths per year. Over 47 million French people are exposed to a level of air pollution particles that are considered to be unsafe by the WHO.
View ProjectIn agriculture, palm tree cultivation is one of the big sectors with a huge market value. Palm trees are used to produce a variety of products like vegetable oil, bio-fuel, papers, furniture, decorations, fodder for cattle etc. It also has to be mentioned that palm oil is the most widely used vegetable oil in the world.
View ProjectIn this sprint project, we have only 24 hours to present a data solution for Indeed.com We perform an in-depth analysis with its job-posting data and found some interesting insights. Based on our findings, we proposed a marketing strategy for indeed and won the Best Insight Award.
View ProjectDonald Trump and its trade war During his election campaign, President Donald Trump threatened to impose 35% to 45% tariffs on Chinese imports to force China into renegotiating its trade balance with the U.S. The immediate result of that would be a fierce trade war.
View ProjectThis notebook would be my place to organize my thoughts,to share insights, to connect the real-world problems and the most important of all to grow with data science community.