Detecting and Correcting Outliers in Equipment Data3/2/2023
Tractor Zoom over the last five years has aggregated an extensive repository of data on equipment sales across the country. With a database containing records on over $26 billion in equipment sales, we are an industry leader in providing real-time data on agricultural and heavy equipment sales.
In the last year alone we have listed and cataloged over 450,000 pieces of equipment! While an exciting milestone, it has created challenges as our data has quickly surpassed our ability to manually review and approve each listing.
To maintain our high data quality standards, we have implemented an enhanced review process that combines machine learning with our industry experts.
Many of you probably have examples of head-scratching listings where you’re unsure if there was a typo in the list price or if the seller is dreaming. Most of us would see a listing like the examples below and immediately think something was off.
Often that means a potential buyer moves on to another piece of equipment. For our Iron Comps customers, it can mean that bad data gets included in their comparables and market trends. It goes without saying that those are both undesirable, so we set out to do something about it.
We trained machine learning models to churn through tens of thousands of listings a minute to identify issues or inconsistencies. The following three examples were all flagged by our machine learning models as likely outliers. These outliers, or pieces that look a little off, are flagged for further review.
Here is an example of a lot that our machine learning models picked up as likely incorrect based on the number of engine hours relative to the separator hours.
Here is a John Deere 4066R that has 10 times the number of hours of any other 2018 4066R in our database!
The machine learning models noticed that this price was high for the number of hours. Our human experts recognized this is likely a case where hours are set to the maximum number of hours the machine will show.
Our Solution to Bad Data
You might be asking yourself, "That’s pretty neat that you found these outliers, but what are you going to do about them?"
That’s a fantastic question! We took a human-centered approach to solving this problem by making sure we are giving our team of equipment experts a smaller number of listings to manually review. In this case, we use machine learning to make our team more efficient and allow us to continue to expand the breadth of our data without compromising data quality or hiring a ton of additional help.
Once our team receives a notification of a likely outlier, we will work through a process to try and correct the data if possible.
For example, the listing may have an incorrect number of engine hours but we are able to verify the number of hours in a picture. Here, we can quickly correct the listing on Tractor Zoom and in our database.
In some cases, we are unable to verify what the correct value should be so we reach out to our auctioneer and dealer partners to alert them to the issue and get the data corrected.
Finally, if we are unable to verify the correct values, we exclude these lots from use within Iron Comps core comparable evaluations.
Data quality is one of the core tenets of our work. We know that bad data causes bad analytics and leads to bad decisions. More succinctly: Garbage In, Garbage Out.
It doesn’t matter how much history a company has or how much market share they capture, if we don’t trust the data it isn’t useful.
As an Iron Comps customer, you can count on our data being accurate and up-to-date. As a dealer or auctioneer partner, you can depend on us to help monitor your listing for any issues.
If you have any questions about our process to detect outliers and data quality, feel free to contact Hank Mandsager, Lead Data Scientist.