Quantifying Quality

One of the things that I very often do to find inspiration is to read. I’m kind of a nerd when it comes to reading because about 99% of what I read is non-fiction. I take some grief about this from the few people who know this about me. It is deeply imbedded in my personality.

My most recent inspiration came from reading The Signal and the Noise by Nate Silver. Silver’s expertise is in the area of prognostication. More so in using Bayesian statistics to draw conclusions about probabilities of future events. For example, he correctly predicted 49 of 50 states in the 2008 presidential election and all 50 states in 2012.

Bayesian statistics utilize a sometimes subjective prior probability to make a prediction about the probability of a future event. For example, the number and strength of past earthquakes increase the probability of future, stronger earthquakes.

The reason for my post, as I normally right about education, is whether we can use Bayesian statistics to determine whether the new Framework for Teaching, developed by Charlotte Danielson and adopted by Pennsylvania as it’s new evaluation tool, can predict the number of ineffective teachers in a school.

The Danielson model, by her own admission was developed to provide teachers a framework through which to improve their teaching. Therefore, what we probably want to predict is how likely are teachers to improve by utilizing the Framework.

Another question to answer, and Silver has eluded to his desire to attempt it, is whether any subjective measure can really quantitatively measure the effectiveness of a teacher. In a reddit IAmA, he stated, “There are certainly cases where objective measures applied badly is worse than not applying them at all, and education may well be one of them.” He said this in regard to a question about using test scores to rate teacher effectiveness.

In Pennsylvania, teachers and administrators will be rated on a combination of both: standardized test scores and the Danielson Framework for Teaching. One concrete measure that historically has been shown to be determined more by location than by quality teaching and one qualitative, formative measure that will be applied quantitatively.

I know it’s like rearranging deck chairs on the Titanic but the argument continually needs to be made that education is more qualitative; more art and less quantitative; less science.

False Proxy + False Proxy = Your Life

Inspiration to write can come from a lot of places. For me it comes quite often from Seth Godin‘s blog and a friend who goads me into connecting his work to education.

Today Mr. Godin blogged about false proxy traps. You can check out his blog for details. In a nutshell a false proxy is when a someone measures a component of something that is difficult to measure in order to justify the entire product. Good example: measuring the quality of a police force by how many people are put in jail. This measure would not take into account that a good police force may limit crime by there mere presence or that they are exemplary at solving problems. Crazy example: measuring the power of the Republican Party by watching Fox News exclusively.

Everyone may not agree but the forced high stakes testing required by NCLB is just such a trap. The idea of the testing program is to determine the quality of a school and its staff. Make no mistake about it. These tests, differently labeled in each state, were never meant to test the knowledge of students. The false proxy comes in when we try to take one test, administer it to thousands of students, and then compare them across a wide breadth of cultures, economies, and immeasurable demographics. My guess is that a district’s aggregate PSSA score can just as accurately determine the median income of the school’s coverage area as it can the success of the school. They could also pretty accurately determine the number of parents who attend parent conferences. The first thought would be easy to prove. Take every school and list them from high to low based on aid ratio (market value/personal income) and then make another list and sort them from low to high on district average PSSA score. I’d be willing to bet there is a high degree of comparability. It’s all public knowledge; give it a whirl!

So, I think we have shown pretty accurately that the PSSA is a false proxy for determining the quality of a school. Don’t get me wrong; some teacher’s should find a new career path. But I can compare scores of teacher’s that I work with who have abilities that are across the board in terms of quality instruction and the one’s that have limited skills have students who do just as well as the distinguished teacher’s students.

Second false proxy: The new Pennsylvania teacher evaluation model. This is even simpler. Charlotte Danielson developed this model to assist in improving the quality of teaching. Never, and the company developing the evaluation tool for Pennsylvania has admitted this, did she intend for the rubric to be diminished to a number. Statistically speaking, you can’t take a measure that is qualitative and quantify it. That is, however, what the Pennsylvania Department of Education intends to do. A tool built to determine the strengths and weaknesses of a teacher and guide him or her to being a distinguished educator will be used to measure his or her effectiveness.

Not only will it water all of this high quality information down to a single number but that number will count as 50% of a teacher’s – and eventually an administrator’s – annual evaluation. Throw in that another 15-30% of the annual evaluation will be determined by PSSA scores and you have a conglomeration of false proxies and statistical fallacies. Goog luck! Two years of low scores and poor observations or probably two years of average observations and average PSSA scores and you may be looking for a job – and me too!