Excel Expertise or Thinking off the Wall

charlieoscar

Member
Sep 28, 2014
1,339
Building a database of line scores from Retrosheet data (which is simple) but I also intend to include fields for net score after each inning (Hscore - Vscore). From that I want to look at when teams actually win or lose games.

Poking around the web led to a MATCH-INDEX formula that works. Except...I realized that a team could hold the lead, lose it, and then regain it, so the net score could go like 0,+2,-1,+1,-2,+5...., and the formula would return +2 when the answer should be +5.

Any ideas? I'm currently in a forest/trees state.
 

tonyandpals

Well-Known Member
Lifetime Member
SoSH Member
Mar 18, 2004
7,853
Burlington
Is there no other way to derive the win?

edit: removed previous post.

How is net breaking down?

Score tied, home team winning by two, home team losing by 1, home team winning by one, home team losing by 2, home team winning by 5?
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
It would be helpful to see an example of what you're talking about. Post an example to Google Sheets?
 

InstaFace

The Ultimate One
SoSH Member
Sep 27, 2016
21,590
Pittsburgh, PA
i'm not following. I assume you're creating fields like ScoreAfter1, ScoreAfter2 ... 9, by doing the sum of the inning scores for home team (up through that inning) minus the inning scores of the visiting team. There's no reason that should only return the final score of the game. What's going wrong?
 

Max Power

thai good. you like shirt?
SoSH Member
Jul 20, 2005
7,838
Boston, MA
I think he's looking at the point where the lead is gained and never lost. That means you need to work backwards in your data and look at the first time the lead goes to zero or negative runs. Then add an inning.
 

charlieoscar

Member
Sep 28, 2014
1,339
I think he's looking at the point where the lead is gained and never lost. That means you need to work backwards in your data and look at the first time the lead goes to zero or negative runs. Then add an inning.
Yes, I'm looking for the inning in which the winning team has scored enough runs to win the game, regardless of whether or not it scores more runs. I thought about doing it backwards but then I got preoccupied with extra-inning games and ones that were shorter than nine. I'll have to think about that some more. [or lose games]

Retrosheet's line scores are complete back through the 1906 season and I think it would be interesting to see if/how scoring has changed through different eras. Thanks.
 

Max Power

thai good. you like shirt?
SoSH Member
Jul 20, 2005
7,838
Boston, MA
Yes, I'm looking for the inning in which the winning team has scored enough runs to win the game, regardless of whether or not it scores more runs. I thought about doing it backwards but then I got preoccupied with extra-inning games and ones that were shorter than nine. I'll have to think about that some more. [or lose games]

Retrosheet's line scores are complete back through the 1906 season and I think it would be interesting to see if/how scoring has changed through different eras. Thanks.
Dedicate a column to inning numbers and just return that value rather than the count. The total number of innings doesn't matter.
 

charlieoscar

Member
Sep 28, 2014
1,339
Dedicate a column to inning numbers and just return that value rather than the count. The total number of innings doesn't matter.
Not sure I understand. I have a table with individual innings for the visiting and home teams and then another table with the difference in scores after each inning. Vinn1/Vinn2...; Hinn1/Hinn2... and AFT1/AFT2.... One of the AFT columns in a row will have the number I want but I haven't figured out how to get Excel to recognize that for each game. If I could dedicate a column and fill it in, I wouldn't have my problem :).

Maybe I'll try flow charting it.
 

loafnut

Member
SoSH Member
Jul 18, 2005
165
I can think of two ways to do this. I think index-match-index would work or an array index match. See here:
https://exceljet.net/formula/index-and-match-with-multiple-criteria

The other way is to create a text string of the net score after each inning using text values in reverse order.

So a 5 inning game that went 1,-2,0,-1,+2 would be coded as +1,-2, +-0 , -1, +2 would concatenate to "+2-1+-0-2+1" in a given cell. For this to work, every cell must be in text format. In my spreadsheet this reverse order value is in cell G1.

Then use this formula on the concatenated value:
=IF(ISERROR(ROUNDDOWN(IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2),0))=TRUE,"Firstinning",ROUNDDOWN(IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2),0))

The essence of the formula is this: =IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2)
Where if the first character of the string (which is the last inning) is a "-" then it searches the string for the first "+" and vice versa searching for "-" where the last inning is a "+". Since the concatenated string is the game by inning in reverse order, this searches backwards through the game for the last time the losing team led the game. Subtracting 1 and dividing by 2 will get you how many innings back from the end of the game the final lead change occurred. Should be simple from there to translate that to an inning based on how long the game was. I've used rounddown to get rid of half innings caused by the zero having both signs and iserror to get rid of problems where a team takes the lead in the first inning and never gives it up. In that case it returns "firstinning".

I'm not sure if I totally understand your problem, but I think the reverse concatenate here is the key to getting the "+5" instead of the "+2" in the example from your first post.
 

charlieoscar

Member
Sep 28, 2014
1,339
Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.

MATCH would do what I want except the array searched has to be sorted in ascending or ascending order.
 

InstaFace

The Ultimate One
SoSH Member
Sep 27, 2016
21,590
Pittsburgh, PA
I know excel, I love excel, but people over-obsess about trying to make Excel do things that are algorithmically complex enough that it's no longer the right tool for the job. This sort of thing would be like 20-30 lines of code in PHP or Python or whatever your scripting language of choice would be. (or 200 lines of buggy VB, but leave that aside)

If you can represent this logic as a bunch of for() loops and if()-else() statements, you should just code it as such.
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.

MATCH would do what I want except the array searched has to be sorted in ascending or ascending order.
Can you share a small extract? I think we could probably do it with some nested formulas. Like, if the losing team scores 4 runs, you want to identify the first inning that the winning team had accumulated 5 runs. It should be do-able.
 

mt8thsw9th

anti-SoSHal
SoSH Member
Jul 17, 2005
17,120
Brooklyn
So out of curiosity I was playing with this quickly, and it's really getting the "when did they win?" to populate, which I'm trying to figure out.



I mean, obviously they won in the 7th when that tie turned into an "X", but I'm just trying to figure out the best way to do a nested IF playing off if they got the W or not, and have it populate the inning that the game was won.

edit - basically need something that looks for, and returns the last instance of "Tie" turning into "X" and that should always be the inning that the game was won, even if it was the first. In my example it's the 7th.
 
Last edited:

Max Power

thai good. you like shirt?
SoSH Member
Jul 20, 2005
7,838
Boston, MA
Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.

MATCH would do what I want except the array searched has to be sorted in ascending or ascending order.
I see what you're trying to do and I think you're going about it in a convoluted way. If a team "wins" the game by scoring 4 runs at any point in the game when the other team scores 3, then you need to take the losing team's final run total and find the inning in which the winning team exceeded it. Lead changes are irrelevant in your definition.
 

charlieoscar

Member
Sep 28, 2014
1,339
I mean, obviously they won in the 7th when that tie turned into an "X", but I'm just trying to figure out the best way to do a nested IF playing off if they got the W or not, and have it populate the inning that the game was won.
I've been hoping to stay away from nested IFs. I think you need to look at the point where the losing team has scored its final runs then find the inning when the winning team exceeds that total. I've got something I'm going to try but I have to think about it some more.
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
I've been hoping to stay away from nested IFs. I think you need to look at the point where the losing team has scored its final runs then find the inning when the winning team exceeds that total. I've got something I'm going to try but I have to think about it some more.
This shouldn't be that complicated. SHARE SOME DATA and I'll make a formula.
 

Five Cent Head

64th note
Gold Supporter
SoSH Member
Jul 17, 2007
761
Seattle
Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.
Does a team effectively win when they take the lead for good, or when they surpass the other team’s final run total? The link I posted gives at least one way to find the last place in a column where a condition is met, whether that’s the last time before team A took the lead for good, or the last time before team A’s run total became greater than team B’s final run total.
 

loafnut

Member
SoSH Member
Jul 18, 2005
165
Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.

MATCH would do what I want except the array searched has to be sorted in ascending or ascending order.
My idea of what you wanted was right. The above formula I posted works for what you have described here. You would just have to code with pluses and minuses which might be a pain. It’s convoluted to do this in excel imho and maybe a different tool would work better.
 

charlieoscar

Member
Sep 28, 2014
1,339
I may have it figured out but I have to run a bunch more data through it. MATCH returns the location in a data array but it wants the data in ascending order and then returns the first, not last. It's going to be a pain because I'll have something like a million records to process. I'll post some data and the results when I get done.
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
I may have it figured out but I have to run a bunch more data through it. MATCH returns the location in a data array but it wants the data in ascending order and then returns the first, not last. It's going to be a pain because I'll have something like a million records to process. I'll post some data and the results when I get done.
We're asking for a small subset of the records so we can look at it and try to offer a solution, but I'm also interested in seeing the result, sure.
 

DanoooME

above replacement level
SoSH Member
Mar 16, 2008
19,796
Henderson, NV
I may have it figured out but I have to run a bunch more data through it. MATCH returns the location in a data array but it wants the data in ascending order and then returns the first, not last. It's going to be a pain because I'll have something like a million records to process. I'll post some data and the results when I get done.
Wait, you're using Excel for a million rows of data? InstaFace has it right; you're using the wrong tool for this problem. You must not be working with the entire data set, otherwise I would have expected a crash by now.
 

charlieoscar

Member
Sep 28, 2014
1,339
Wait, you're using Excel for a million rows of data? InstaFace has it right; you're using the wrong tool for this problem. You must not be working with the entire data set, otherwise I would have expected a crash by now.
No, I'm downloading comma-delimited files by decade (well, the years are zipped by decade) into Excel to make tables for an Access database.
 

mt8thsw9th

anti-SoSHal
SoSH Member
Jul 17, 2005
17,120
Brooklyn
I may have it figured out but I have to run a bunch more data through it. MATCH returns the location in a data array but it wants the data in ascending order and then returns the first, not last. It's going to be a pain because I'll have something like a million records to process. I'll post some data and the results when I get done.
You can't reverse your array, simply by having a row below with A1 = A9, A2 = A8, etc., and then run the MATCH on that?
 

charlieoscar

Member
Sep 28, 2014
1,339
You can't reverse your array, simply by having a row below with A1 = A9, A2 = A8, etc., and then run the MATCH on that?
The syntax of MATCH is MATCH(lookup_value, lookup_array, [match_type]), where match_type =
1 --- finds the largest value that is less than or equal to the lookup_value
0 --- finds first value identical to lookup_value
-1-- finds the smallest value that is greater than or equal to the lookup_value

Finding a match_value equal to the lookup_value is a problem as it may find an inning in which the score is tied or the first inning in which identical scores were derived.

This is some data:

 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
The syntax of MATCH is MATCH(lookup_value, lookup_array, [match_type]), where match_type =
1 --- finds the largest value that is less than or equal to the lookup_value
0 --- finds first value identical to lookup_value
-1-- finds the smallest value that is greater than or equal to the lookup_value

Finding a match_value equal to the lookup_value is a problem as it may find an inning in which the score is tied or the first inning in which identical scores were derived.

This is some data:

So I can do some dumb nest If statements! I copied your A1 through R11. In Column T through X I titled them Vscore, Hscore, Needed to win & Winner, respectively. Vscore & Hscore are obvious,

  • Needed to win is: =MIN(T2:U2)+1
  • Winner is: =IF(T2>U2,"V","H")
  • Win Inning is the stupid one: =IF(W2="H",IF(AH2>=V2,1,IF(AI2>V2,2,IF(AJ2>V2,3,IF(AK2>V2,4,IF(AL2>V2,5,IF(AM2>V2,6,IF(AN2>V2,7,IF(AO2>V2,8,9)))))))),IF(Y2>=V2,1,IF(Z2>V2,2,IF(AA2>V2,3,IF(AB2>V2,4,IF(AC2>V2,5,IF(AD2>V2,6,IF(AE2>V2,7,IF(AF2>V2,8,9)))))))))
  • Where columns Y through AG are each the visitor's total runs after each inning 1 through 9 and AI through AP are the home team's total runs after each inning 1 through 9
This only works for 9 inning games, but you said the extra inning part would be easy to append to this.

For rows 2 through 11, I came up with the home team winning in innings:
  • 1
  • 9
  • 7
  • 8
  • 7
  • 2
  • 9
  • 6
  • 8
  • 9
 

charlieoscar

Member
Sep 28, 2014
1,339
No, you need to look at lines 14-23, which show the cumulative scoring by inning, e.g., SUM($A14:A14), ..., SUM($A14:I14)...; SUM($J14:J14),...SUM($J14:R14).

Then you find the first inning in which the Home Score in a line is greater than the first highest score in the Visitors line. I filled in those in the Home Score line in red (and they are listed numerically in column T opposite the first tables).
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
No, you need to look at lines 14-23, which show the cumulative scoring by inning, e.g., SUM($A14:A14), ..., SUM($A14:I14)...; SUM($J14:J14),...SUM($J14:R14).

Then you find the first inning in which the Home Score in a line is greater than the first highest score in the Visitors line. I filled in those in the Home Score line in red (and they are listed numerically in column T opposite the first tables).
Im pretty sure my formulas did that without creating extra rows, but making extra columns instead.

Edit: I mean, did you get different results than I did?
 

tonyandpals

Well-Known Member
Lifetime Member
SoSH Member
Mar 18, 2004
7,853
Burlington
Im pretty sure my formulas did that without creating extra rows, but making extra columns instead.

Edit: I mean, did you get different results than I did?
He has his results above and yes he got different results. 4th game for example is not won in the 8th as your set says, it's won in the. The 5th game is one in the 3rd, not 7th. I didn't check the rest.
 

charlieoscar

Member
Sep 28, 2014
1,339
Edit: I mean, did you get different results than I did?
In Row 1, the next-to-last column (Col. T) on the right is Innwon, and that column contains the inning in which each game was won.

I originally had some accompanying explanations in the Google Sheet but they caused columns to stretch out to very wide lengths, so I removed them and forgot to add them in my written explanation. Sorry.
 

SirPsychoSquints

Member
SoSH Member
Jul 13, 2005
4,985
Pittsburgh, PA
In Row 1, the next-to-last column (Col. T) on the right is Innwon, and that column contains the inning in which each game was won.

I originally had some accompanying explanations in the Google Sheet but they caused columns to stretch out to very wide lengths, so I removed them and forgot to add them in my written explanation. Sorry.
I corrected my "Win Inning" fomula below. The problem was I used ">" instead of ">=" several times. I now get all the same results as you:

=IF(W2="H",IF(AH2>=V2,1,IF(AI2>=V2,2,IF(AJ2>=V2,3,IF(AK2>=V2,4,IF(AL2>=V2,5,IF(AM2>=V2,6,IF(AN2>=V2,7,IF(AO2>=V2,8,9)))))))),IF(Y2>=V2,1,IF(Z2>=V2,2,IF(AA2>=V2,3,IF(AB2>=V2,4,IF(AC2>=V2,5,IF(AD2>=V2,6,IF(AE2>=V2,7,IF(AF2>=V2,8,9)))))))))
 

charlieoscar

Member
Sep 28, 2014
1,339
I corrected my "Win Inning" fomula below. The problem was I used ">" instead of ">=" several times. I now get all the same results as you:
As I remarked earlier, I was hoping to avoid nested IF statements. I thought there might be some Excel "trick" that I was overlooking. But, thanks.
 

charlieoscar

Member
Sep 28, 2014
1,339
I know excel, I love excel, but people over-obsess about trying to make Excel do things that are algorithmically complex enough that it's no longer the right tool for the job. This sort of thing would be like 20-30 lines of code in PHP or Python or whatever your scripting language of choice would be. (or 200 lines of buggy VB, but leave that aside)

If you can represent this logic as a bunch of for() loops and if()-else() statements, you should just code it as such.
Yeah, once upon a time. I was quite good at programming...Fortran with assembly language subroutines, but that was a long time ago and I haven't done any programming since MS-Dos days when Basic was included, so it might be a little more for me than just scribbling out 20-30 lines of code. Admittedly, most people who use Retrosheet data for analysis do use code.

And just as an aside to all this, each record in a Retrosheet's Game Log has 161 fields, of which only about 10 are needed for this study, and the main two--visitor and home inning-by-inning scoring--are presented as words--102300[12]00, where [12] would be a 12-run inning by one team. So, your script would either have to separate each of those words into columns by innings (variable in game length) and you can do what I do and open the comma-delimited file in Excel, deleted the unneeded fields, copy the scoring word fields to my text editor and run a Regular Expressions/Perl routine to convert then to comma-delimited, and re-copy them back to Excel, or if you are really good, then maybe you can do all of it by coding.
 

InstaFace

The Ultimate One
SoSH Member
Sep 27, 2016
21,590
Pittsburgh, PA
that sounds like it would be 3-5x simpler by coding, in terms of total time spent getting the output you want. You can then format it using Excel if you like but at that point it's trivial.

If you used to code, you've crossed the hardest barrier which is learning to think like a coder. If this is something you're spending your free time on, your time will be well rewarded by chugging through some basic Python documentation. I mean, if I have a string input like "102300[12]00", parsing that into an inningScore[x] array (in PHP) or List/Dict (Python) would seriously be like 10 lines of code max. Remember, a String is a sequence, every character is an entry in that sequence. Since I can assume that there are no 3+ digit inning scores, I can take the shortcut that anytime we hit brackets, the next two characters are the inning score (but making it generalized wouldn't be much harder).

Code:
i = 0
inningScore = [] # NB - zero-based array, will have indices 0-8

while i < len(gameString):
    if gameString[i] == '[':
        inningScore.append( int(gameString[i+1] + gameString[i+2]) )
        i += 3 # move past the close bracket
    else:
        inningScore.append( int(gameString) )
        i += 1
This really is beginner-level stuff for real coding. You're frankly doing a lot more mental work trying to shoehorn this into Excel than you would by just biting the bullet and changing toolkits, even if it didn't feel that way at the start.
 

charlieoscar

Member
Sep 28, 2014
1,339
This really is beginner-level stuff for real coding. You're frankly doing a lot more mental work trying to shoehorn this into Excel than you would by just biting the bullet and changing toolkits, even if it didn't feel that way at the start.
As I noted, I already have a mechanism for parsing the innings score words. I simply replace all inning scores like [12] with a letter unique to the score (102300[12]00 becomes 102300c00, insert a semi-colon after each letter/number, then replace the letters with their corresponding numbers. I use NoteTab Pro as a text editor and it has its own programming language built in, largely based on Regular Expressions. It also can act as a clipboard, so I can turn that on, select a column in Excel and hit Ctrl+C. Now when I switch to NoteTab, the parser I wrote will process several thosand lines in a second or two and I can just paste it back into Excel.

As for using Excel, I also noted that each record from a Retrosheet Game Log will have 161 fields in it, of which I have to delete about 150. The data I get is comma-delimited (.csv) and the easiest way I know how to deal with that is simply opening it up in Excel. That also makes it easy to decline importing unwanted fields. Once all that was done, I figured there must be some reasonable way to manipulate the data (inning-by-inning scores) in Excel to get what I need. Unfortunately, I haven't succeeded in that part.

This is a link to the Retrosheet page with Game Log data (and that also includes a link to a Guide to Retrosheet Game Logs). Data can be downlaoded by year, decades, or en masse.
https://www.retrosheet.org/gamelogs/index.html

Thanks
 

tonyandpals

Well-Known Member
Lifetime Member
SoSH Member
Mar 18, 2004
7,853
Burlington
Why avoid the nested IF, if your desired platform is excel? Write the formula once and be done.

Also, using excel to get rid of the fields your don't want becomes tiresome. You could easily use many scripting languages to read in the entire file, it out the desired fields in a new file. Anything SQL based would be a breeze once set up. Push a button and your file is transformed. Could nest the if() in there too.
 

charlieoscar

Member
Sep 28, 2014
1,339
I was avoiding nested IF statements because I thought (was hoping) there was a saner way of accomplishing what I wanted in Excel. My knowledge of SQL is very limited and I would need to spend time becoming proficient in that as well as whatever scripting language worked with it, Windows 8.1, and MS Access (or learn a new free database system). Since the only solution this group has come up with for Excel is nested IFs, then I guess I'll be following that route. But, thanks for the idea.
 

section15

Member
SoSH Member
Mar 23, 2007
227
Bradford, MA and section 15
I used to be an extremely proficient SAS (Statistical Analysis System) coder - it's like SQL in some ways, like FORTRAN in others, with built-in scientific and statistical functions. Very easy to learn, but.... it's not free. Or cheap, if you're using it commercially ---

www.sas.com
 

jercra

No longer respects DeChambeau
SoSH Member
Jul 31, 2006
3,145
Arvada, Co
I used to be an extremely proficient SAS (Statistical Analysis System) coder - it's like SQL in some ways, like FORTRAN in others, with built-in scientific and statistical functions. Very easy to learn, but.... it's not free. Or cheap, if you're using it commercially ---

www.sas.com
 

charlieoscar

Member
Sep 28, 2014
1,339
I used to be an extremely proficient SAS (Statistical Analysis System) coder - it's like SQL in some ways, like FORTRAN in others, with built-in scientific and statistical functions. Very easy to learn, but.... it's not free. Or cheap, if you're using it commercially ---

www.sas.com
While it might be good, this is a one-time non-commercial project and I guess it will be Excel nested IFs. Thanks