Dismiss Notice
Guest, I have a big favor to ask you. We've been working very hard to establish ourselves on social media. If you like/follow our pages it would be a HUGE help to us. SoSH on Facebook and Inside the Pylon Thanks! Nip

Excel Expertise or Thinking off the Wall

Discussion in 'BYTE ME: Technology discussion' started by charlieoscar, Nov 29, 2018.

  1. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    Building a database of line scores from Retrosheet data (which is simple) but I also intend to include fields for net score after each inning (Hscore - Vscore). From that I want to look at when teams actually win or lose games.

    Poking around the web led to a MATCH-INDEX formula that works. Except...I realized that a team could hold the lead, lose it, and then regain it, so the net score could go like 0,+2,-1,+1,-2,+5...., and the formula would return +2 when the answer should be +5.

    Any ideas? I'm currently in a forest/trees state.
     
  2. tonyandpals

    tonyandpals Well-Known Member Lifetime Member SoSH Member

    Messages:
    5,816
    Is there no other way to derive the win?

    edit: removed previous post.

    How is net breaking down?

    Score tied, home team winning by two, home team losing by 1, home team winning by one, home team losing by 2, home team winning by 5?
     
  3. bowiac

    bowiac I've been living a lie. Lifetime Member SoSH Member

    Messages:
    11,998
    It would be helpful to see an example of what you're talking about. Post an example to Google Sheets?
     
  4. InstaFace

    InstaFace MDLzera

    Messages:
    5,920
    i'm not following. I assume you're creating fields like ScoreAfter1, ScoreAfter2 ... 9, by doing the sum of the inning scores for home team (up through that inning) minus the inning scores of the visiting team. There's no reason that should only return the final score of the game. What's going wrong?
     
  5. Max Power

    Max Power thai good. you like shirt? SoSH Member

    Messages:
    3,307
    I think he's looking at the point where the lead is gained and never lost. That means you need to work backwards in your data and look at the first time the lead goes to zero or negative runs. Then add an inning.
     
  6. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    Yes, I'm looking for the inning in which the winning team has scored enough runs to win the game, regardless of whether or not it scores more runs. I thought about doing it backwards but then I got preoccupied with extra-inning games and ones that were shorter than nine. I'll have to think about that some more. [or lose games]

    Retrosheet's line scores are complete back through the 1906 season and I think it would be interesting to see if/how scoring has changed through different eras. Thanks.
     
  7. Max Power

    Max Power thai good. you like shirt? SoSH Member

    Messages:
    3,307
    Dedicate a column to inning numbers and just return that value rather than the count. The total number of innings doesn't matter.
     
  8. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    Not sure I understand. I have a table with individual innings for the visiting and home teams and then another table with the difference in scores after each inning. Vinn1/Vinn2...; Hinn1/Hinn2... and AFT1/AFT2.... One of the AFT columns in a row will have the number I want but I haven't figured out how to get Excel to recognize that for each game. If I could dedicate a column and fill it in, I wouldn't have my problem :).

    Maybe I'll try flow charting it.
     
  9. loafnut

    loafnut Member SoSH Member

    Messages:
    52
    I can think of two ways to do this. I think index-match-index would work or an array index match. See here:
    https://exceljet.net/formula/index-and-match-with-multiple-criteria

    The other way is to create a text string of the net score after each inning using text values in reverse order.

    So a 5 inning game that went 1,-2,0,-1,+2 would be coded as +1,-2, +-0 , -1, +2 would concatenate to "+2-1+-0-2+1" in a given cell. For this to work, every cell must be in text format. In my spreadsheet this reverse order value is in cell G1.

    Then use this formula on the concatenated value:
    =IF(ISERROR(ROUNDDOWN(IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2),0))=TRUE,"Firstinning",ROUNDDOWN(IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2),0))

    The essence of the formula is this: =IF(LEFT(G1,1)="-",(SEARCH("+",G1)-1)/2,(SEARCH("-",G1)-1)/2)
    Where if the first character of the string (which is the last inning) is a "-" then it searches the string for the first "+" and vice versa searching for "-" where the last inning is a "+". Since the concatenated string is the game by inning in reverse order, this searches backwards through the game for the last time the losing team led the game. Subtracting 1 and dividing by 2 will get you how many innings back from the end of the game the final lead change occurred. Should be simple from there to translate that to an inning based on how long the game was. I've used rounddown to get rid of half innings caused by the zero having both signs and iserror to get rid of problems where a team takes the lead in the first inning and never gives it up. In that case it returns "firstinning".

    I'm not sure if I totally understand your problem, but I think the reverse concatenate here is the key to getting the "+5" instead of the "+2" in the example from your first post.
     
  10. jhp64

    jhp64 Member SoSH Member

    Messages:
    165
  11. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    Let me try explaining what I want to do one more time. Suppose the visiting team does not score in the top of the 1st but the home team scores 4 runs in the bottom of the 1st. Now, if the visiting team's final score is 3 runs, then effectively the home team won the game in the bottom of the 1st. The problem I am having with this is when the lead changes hands during a game. If a game goes into the bottom of the 9th or into extra-innings, you know when the winning run was scored and you also know in shut-outs; it's the other games.

    MATCH would do what I want except the array searched has to be sorted in ascending or ascending order.
     
  12. InstaFace

    InstaFace MDLzera

    Messages:
    5,920
    I know excel, I love excel, but people over-obsess about trying to make Excel do things that are algorithmically complex enough that it's no longer the right tool for the job. This sort of thing would be like 20-30 lines of code in PHP or Python or whatever your scripting language of choice would be. (or 200 lines of buggy VB, but leave that aside)

    If you can represent this logic as a bunch of for() loops and if()-else() statements, you should just code it as such.
     
  13. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    Can you share a small extract? I think we could probably do it with some nested formulas. Like, if the losing team scores 4 runs, you want to identify the first inning that the winning team had accumulated 5 runs. It should be do-able.
     
  14. tonyandpals

    tonyandpals Well-Known Member Lifetime Member SoSH Member

    Messages:
    5,816
    Let's see some data!
     
  15. mt8thsw9th

    mt8thsw9th anti-SoSHal SoSH Member

    Messages:
    16,997
    So out of curiosity I was playing with this quickly, and it's really getting the "when did they win?" to populate, which I'm trying to figure out.

    [​IMG]

    I mean, obviously they won in the 7th when that tie turned into an "X", but I'm just trying to figure out the best way to do a nested IF playing off if they got the W or not, and have it populate the inning that the game was won.

    edit - basically need something that looks for, and returns the last instance of "Tie" turning into "X" and that should always be the inning that the game was won, even if it was the first. In my example it's the 7th.
     
    #15 mt8thsw9th, Nov 30, 2018
    Last edited: Nov 30, 2018
  16. Max Power

    Max Power thai good. you like shirt? SoSH Member

    Messages:
    3,307
    I see what you're trying to do and I think you're going about it in a convoluted way. If a team "wins" the game by scoring 4 runs at any point in the game when the other team scores 3, then you need to take the losing team's final run total and find the inning in which the winning team exceeded it. Lead changes are irrelevant in your definition.
     
  17. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    I've been hoping to stay away from nested IFs. I think you need to look at the point where the losing team has scored its final runs then find the inning when the winning team exceeds that total. I've got something I'm going to try but I have to think about it some more.
     
  18. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    This shouldn't be that complicated. SHARE SOME DATA and I'll make a formula.
     
  19. tonyandpals

    tonyandpals Well-Known Member Lifetime Member SoSH Member

    Messages:
    5,816
    Seriously. Come ask for help. Multiple people want to help. Multiple people ask for data. Multiple responses from OP w/ no data.
     
  20. jhp64

    jhp64 Member SoSH Member

    Messages:
    165
    Does a team effectively win when they take the lead for good, or when they surpass the other team’s final run total? The link I posted gives at least one way to find the last place in a column where a condition is met, whether that’s the last time before team A took the lead for good, or the last time before team A’s run total became greater than team B’s final run total.
     
  21. loafnut

    loafnut Member SoSH Member

    Messages:
    52
    My idea of what you wanted was right. The above formula I posted works for what you have described here. You would just have to code with pluses and minuses which might be a pain. It’s convoluted to do this in excel imho and maybe a different tool would work better.
     
  22. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    I may have it figured out but I have to run a bunch more data through it. MATCH returns the location in a data array but it wants the data in ascending order and then returns the first, not last. It's going to be a pain because I'll have something like a million records to process. I'll post some data and the results when I get done.
     
  23. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    We're asking for a small subset of the records so we can look at it and try to offer a solution, but I'm also interested in seeing the result, sure.
     
  24. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    I understand but it was also a weekend and I had plans.
     
  25. DanoooME

    DanoooME Well-Known Member Gold Supporter SoSH Member

    Messages:
    14,676
    Wait, you're using Excel for a million rows of data? InstaFace has it right; you're using the wrong tool for this problem. You must not be working with the entire data set, otherwise I would have expected a crash by now.
     
  26. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    No, I'm downloading comma-delimited files by decade (well, the years are zipped by decade) into Excel to make tables for an Access database.
     
  27. mt8thsw9th

    mt8thsw9th anti-SoSHal SoSH Member

    Messages:
    16,997
    You can't reverse your array, simply by having a row below with A1 = A9, A2 = A8, etc., and then run the MATCH on that?
     
  28. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    The syntax of MATCH is MATCH(lookup_value, lookup_array, [match_type]), where match_type =
    1 --- finds the largest value that is less than or equal to the lookup_value
    0 --- finds first value identical to lookup_value
    -1-- finds the smallest value that is greater than or equal to the lookup_value

    Finding a match_value equal to the lookup_value is a problem as it may find an inning in which the score is tied or the first inning in which identical scores were derived.

    This is some data:

     
  29. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    So I can do some dumb nest If statements! I copied your A1 through R11. In Column T through X I titled them Vscore, Hscore, Needed to win & Winner, respectively. Vscore & Hscore are obvious,

    • Needed to win is: =MIN(T2:U2)+1
    • Winner is: =IF(T2>U2,"V","H")
    • Win Inning is the stupid one: =IF(W2="H",IF(AH2>=V2,1,IF(AI2>V2,2,IF(AJ2>V2,3,IF(AK2>V2,4,IF(AL2>V2,5,IF(AM2>V2,6,IF(AN2>V2,7,IF(AO2>V2,8,9)))))))),IF(Y2>=V2,1,IF(Z2>V2,2,IF(AA2>V2,3,IF(AB2>V2,4,IF(AC2>V2,5,IF(AD2>V2,6,IF(AE2>V2,7,IF(AF2>V2,8,9)))))))))
    • Where columns Y through AG are each the visitor's total runs after each inning 1 through 9 and AI through AP are the home team's total runs after each inning 1 through 9
    This only works for 9 inning games, but you said the extra inning part would be easy to append to this.

    For rows 2 through 11, I came up with the home team winning in innings:
    • 1
    • 9
    • 7
    • 8
    • 7
    • 2
    • 9
    • 6
    • 8
    • 9
     
  30. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    No, you need to look at lines 14-23, which show the cumulative scoring by inning, e.g., SUM($A14:A14), ..., SUM($A14:I14)...; SUM($J14:J14),...SUM($J14:R14).

    Then you find the first inning in which the Home Score in a line is greater than the first highest score in the Visitors line. I filled in those in the Home Score line in red (and they are listed numerically in column T opposite the first tables).
     
  31. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    Im pretty sure my formulas did that without creating extra rows, but making extra columns instead.

    Edit: I mean, did you get different results than I did?
     
  32. tonyandpals

    tonyandpals Well-Known Member Lifetime Member SoSH Member

    Messages:
    5,816
    He has his results above and yes he got different results. 4th game for example is not won in the 8th as your set says, it's won in the. The 5th game is one in the 3rd, not 7th. I didn't check the rest.
     
  33. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    In Row 1, the next-to-last column (Col. T) on the right is Innwon, and that column contains the inning in which each game was won.

    I originally had some accompanying explanations in the Google Sheet but they caused columns to stretch out to very wide lengths, so I removed them and forgot to add them in my written explanation. Sorry.
     
  34. SirPsychoSquints

    SirPsychoSquints Member SoSH Member

    Messages:
    3,563
    I corrected my "Win Inning" fomula below. The problem was I used ">" instead of ">=" several times. I now get all the same results as you:

    =IF(W2="H",IF(AH2>=V2,1,IF(AI2>=V2,2,IF(AJ2>=V2,3,IF(AK2>=V2,4,IF(AL2>=V2,5,IF(AM2>=V2,6,IF(AN2>=V2,7,IF(AO2>=V2,8,9)))))))),IF(Y2>=V2,1,IF(Z2>=V2,2,IF(AA2>=V2,3,IF(AB2>=V2,4,IF(AC2>=V2,5,IF(AD2>=V2,6,IF(AE2>=V2,7,IF(AF2>=V2,8,9)))))))))
     
  35. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    As I remarked earlier, I was hoping to avoid nested IF statements. I thought there might be some Excel "trick" that I was overlooking. But, thanks.
     
  36. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    Yeah, once upon a time. I was quite good at programming...Fortran with assembly language subroutines, but that was a long time ago and I haven't done any programming since MS-Dos days when Basic was included, so it might be a little more for me than just scribbling out 20-30 lines of code. Admittedly, most people who use Retrosheet data for analysis do use code.

    And just as an aside to all this, each record in a Retrosheet's Game Log has 161 fields, of which only about 10 are needed for this study, and the main two--visitor and home inning-by-inning scoring--are presented as words--102300[12]00, where [12] would be a 12-run inning by one team. So, your script would either have to separate each of those words into columns by innings (variable in game length) and you can do what I do and open the comma-delimited file in Excel, deleted the unneeded fields, copy the scoring word fields to my text editor and run a Regular Expressions/Perl routine to convert then to comma-delimited, and re-copy them back to Excel, or if you are really good, then maybe you can do all of it by coding.
     
  37. InstaFace

    InstaFace MDLzera

    Messages:
    5,920
    that sounds like it would be 3-5x simpler by coding, in terms of total time spent getting the output you want. You can then format it using Excel if you like but at that point it's trivial.

    If you used to code, you've crossed the hardest barrier which is learning to think like a coder. If this is something you're spending your free time on, your time will be well rewarded by chugging through some basic Python documentation. I mean, if I have a string input like "102300[12]00", parsing that into an inningScore[x] array (in PHP) or List/Dict (Python) would seriously be like 10 lines of code max. Remember, a String is a sequence, every character is an entry in that sequence. Since I can assume that there are no 3+ digit inning scores, I can take the shortcut that anytime we hit brackets, the next two characters are the inning score (but making it generalized wouldn't be much harder).

    Code:
    i = 0
    inningScore = [] # NB - zero-based array, will have indices 0-8
    
    while i < len(gameString):
        if gameString[i] == '[':
            inningScore.append( int(gameString[i+1] + gameString[i+2]) )
            i += 3 # move past the close bracket
        else:
            inningScore.append( int(gameString) )
            i += 1
    This really is beginner-level stuff for real coding. You're frankly doing a lot more mental work trying to shoehorn this into Excel than you would by just biting the bullet and changing toolkits, even if it didn't feel that way at the start.
     
  38. charlieoscar

    charlieoscar Member

    Messages:
    1,075
    As I noted, I already have a mechanism for parsing the innings score words. I simply replace all inning scores like [12] with a letter unique to the score (102300[12]00 becomes 102300c00, insert a semi-colon after each letter/number, then replace the letters with their corresponding numbers. I use NoteTab Pro as a text editor and it has its own programming language built in, largely based on Regular Expressions. It also can act as a clipboard, so I can turn that on, select a column in Excel and hit Ctrl+C. Now when I switch to NoteTab, the parser I wrote will process several thosand lines in a second or two and I can just paste it back into Excel.

    As for using Excel, I also noted that each record from a Retrosheet Game Log will have 161 fields in it, of which I have to delete about 150. The data I get is comma-delimited (.csv) and the easiest way I know how to deal with that is simply opening it up in Excel. That also makes it easy to decline importing unwanted fields. Once all that was done, I figured there must be some reasonable way to manipulate the data (inning-by-inning scores) in Excel to get what I need. Unfortunately, I haven't succeeded in that part.

    This is a link to the Retrosheet page with Game Log data (and that also includes a link to a Guide to Retrosheet Game Logs). Data can be downlaoded by year, decades, or en masse.
    https://www.retrosheet.org/gamelogs/index.html

    Thanks
     

Share This Page