Citizens Bank is providing a combination of Homebuyer Data for October –
December 2019 and Credit Bureau and demographic data for April – September 2019.
Project ideas include predicting how many people in each zip9 area will buy
homes from October – December 2019 and predicting which of these people will buy
their first home.
SIMULIA is providing two cases of data representing laser paths that interact
or do not interact with the part boundary. The challenge is to predict
temperatures and melt pool dimensions of a given point. The problem set aims at
building a simulation data driven machine learning model to provide fast
predictions on the melt pool physics to enable process planning and control.
Fidelity is sponsoring a Corona Impact / Monitoring challenge in which
participants look at travel data and make predictions on the potential impacts
of the COVID-19 on tourist and traveler arrivals in the US. Participants are
encouraged to consider both modelling and visualization projects.
LUNAR Lab: WikiAtomicEdits
This dataset provided by the
contains atomic Wikipedia edits containing both
insertions (13.7 million examples) and deletions (9.3 million examples) of a
contiguous chunk of text in an English-language sentence. Potential projects
include language modelling experiments to show the difference in semantic
encoding between models trained on WikiAtomicEdits and raw, unstructured text or
predicting insertion location.
Accessing the Datasets
We've also provided a public Datathon 2020 Google Shared folder that you can use to
access each public dataset. This folder also contains Google Colaboratory ("Colab")
notebooks with information from the dataset providers, including basic exploratory data
analysis and baseline models. We recommend taking a look at these notebooks to get
shared on your project.
Be sure to also check out the README and the Judging Guidelines document in
this shared folder.
If you'd like to work with this dataset, you must first read and agree to an NDA. Among other things, the NDA prevents you from using this data for any future work and publishing any information related to this data without express permission from SIMULIA.
Entering your name in the Google Form is your electronic signature and reflects your agreement to these policies. Keep in mind that this agreement means you may not push any of the data to your github account or publish your final project on the Datathon Devpost. SIMULIA will communicate how to hand in your submission, but this will likely be done through Kaggle. Once you sign the NDA form, you will be added shortly to the private Google Drive with the data from SIMULIA, where you can also interact with an example Colab notebook.